Phylogenetic and computational proteome analysis of Influenza A virus subtype H5N1
P Somvanshi, V Singh, P Seth
Keywords
avian influenza, hemagglutinin, motifs, neuraminidase, phylogenetic tree, subcellular localization
Citation
P Somvanshi, V Singh, P Seth. Phylogenetic and computational proteome analysis of Influenza A virus subtype H5N1. The Internet Journal of Genomics and Proteomics. 2007 Volume 3 Number 2.
Abstract
The avian influenza is an important infectious disease of birds. Its genome consists of negative sense single stranded segmented RNA which encodes 8 structural proteins (HA, NA, PB1, PB2, PA, M1, M2, NP). The most significant surface proteins of influenza A virus subtype H5N1 are hemagglutinin and neuraminidase for pathogenicity in birds to human. The neighbor-joining method with jukes and cantor algorithm was used for predicting the thirty-six hemagglutinin and thirty-seven neuraminidase nucleotide sequences that were used. The phylogenetic analysis with nucleotide sequences showed proximity between different sources of virus in the same country. It was observed that distribution of almost analogous influenza virus infecting different animals in same country. Therefore, the genome of virus encoded the eight structural proteins, which is important for propagation and infection in the host. We have predicted the sub cellular localization of all proteins in the virus targeting the host. These proteins were located in the plasma membrane and host nucleus interacts with DNA. We have also identified the nine motifs viz. N-glycosylation site, N- myristoylation site, protein kinase C phosphrylation site, casein kinase 2 phosphorylation site, cAMP and cGMP dependent protein kinase phosphorylation site, amidation, cell attachment sequences (RGD), tyrosine kinase, and prokaryotic membrane lipoprotein lipid attachment site in the proteins. These motifs were involved in regulation, activity and stability of virus. This computational study may be easy to understand the proteome function useful for targeting the antiviral drugs against these motifs.
Introduction
Avian influenza is an important infectious disease of birds ranged from mild to severe form of illness, caused by influenza A virus, 15 subtypes exists in nature. The pathogenicity of virus was multifactorial, depends on various surface encoding proteins. In addition to surface glycoproteins hemagglutinin and neuraminidase that determine recognition of host cell receptor, were also the main target of host immune response, other protein have contributed for the virulence of highly pathogenic H5N1 strains (Obenauer 2006). All the avian influenza viruses are classified by two surface proteins HA and NA. Till date, 16 HA subtype (H1-H16) and 9 NA subtypes (N1-N9) of influenza A viruses have been identified (Fouchier 2005). The three viral envelope proteins of influenza A virus are most medically important as these are essential viral proteins targeted by host antibody or antiviral drugs. The HA glycoprotein forms spikes at the virions, mediating attaching to host cell sialoside receptors and subsequent get entry by membrane fusion. The NA forms knoblike structures on the surface of virus particles and catalyzes the discharge from infected cells and allow virus to spread. Both proteins (HA and NA) are significant in constructing the phylogenic tree as they are present in the whole influenza virus. In 2003, outbreaks of highly pathogenic influenza A virus H5N1 was identified among the poultry in the republic of Korea (Lee et al 2005). Subsequently, outbreaks by antigenically related viruses were reported among poultry in Thailand, Vietnam, Japan, China, Cambodia, Malaysia and Indonesia, reason for this apparent simultaneous occurrence of H5N1 outbreaks in many Asian countries. However, H5N1 viruses have also been found in dead migratory birds, which may suggest a role of wild birds in the maintenance and spread of H5N1 virus in the region (Chen et al 2005). WHO (2005) reported an outbreaks of highly pathogenic influenza virus (H5N1) recently spread to nine Asian countries. Nucleotide sequences based analysis of H5N1 isolates from birds and humans showed two distinct clades with a non overlapping geographical distribution.
The bioinformatics prediction of protein subcellular localization had extensively been studied for prokaryotic and eukaryotic organisms but not for viruses whose proteins are often involved in extensive interactions at various subcellular localizations with host proteins, these predictions benefit the study of infectious disease to understand the role of proteins in host cells and thereby useful for designing improved therapeutic interventions (Scott
Bioinformatics tools used to analyze and identify several motifs from the whole proteins of influenza virus. Motifs are typically 6-30 amino acids and correspond to its active site, substrate or ligand binding site and structurally important segment of proteins. Protein kinase C is a family of related serine/threonine kinases and plays a key role in cellular responses such as neurotransmission, gene expression, and cell growth and differentiation (Nishizuka 1984). Important motif amidation, a required post-translational modification for the bioactivation of many neuropeptides, is catalyzed by a bifunctional enzyme, peptidyl glycine- amidating monooxygenase, in a two-step reaction (Stewart and Klinman, 1988). RGD motif also plays the role in the host cell attachment. Recently, many motifs in the three proteins like hemagglutinin, neuramidinase and nucleoprotein if influenza virus were identified. Theses motifs were amidation, PKC, casein kinase 2, glycosylation, tyrosin kinase, myristoylation, ATP/GTP binding site.(Tamanna
Material And Methods
Selection of Sequence data set
The complete nucleotide and protein sequences of hemagglutinin and neuramidinase from different sources were extracted from the biological database, viz. National Centre for Biotechnology Information NCBI) cited at http://www.ncbi.nlm.nih.gov
Blast
The relatedness of sequences deposited in databases was evaluated by BLAST (Basic Local Alignment Search Tool), (Altschul et al., 1990) implemented via the NCBI website [www.ncbi.nlm.nih.gov/blast/] against the complete training dataset which is extracted from Genbank database. The BlastN (Nucleotide query – Nucleotide database comparison) in which conditional composition score adjustment having no filters of BLOSUM 62 matrix with threshold expect value 10 were used.
Phylogenetic Analysis
All the sequences were aligned with ClustalX 1.83. The computed alignment was manually checked and corrected. Pair-wise evolutionary distances were computed using the Jukes and Cantor equation implemented in the MEGA 3.1 program and a phylogenetic tree was constructed by neighbor-joining method which comprise DNA weight matrix for nucleotide. Bootstrapped values of 100 were sampled to determine a measure of support for each node on the consensus tree.
Prediction and identification of motifs
The physico-chemical properties of proteins were analyzed viz total number of amino acids, molecular weight, isoelectric point. The subcellular localization of the viral proteins was predicted with online server (Virus-PLoc). Motifs were identified with high probability of occurrence in the protein with ExPasy and Generenner
Results And Discussion
Total 36 hemagglutinin (~1700 bp) and 37 neuraminidase (~1410) nucleotide sequences were used to construct the phylogenetic tree. It was analyzed that different source of virus were present in the same country, four clades were obtained in each phylogenetic tree respectively. First clade showed the HA and NA sequence similarity and proximity with all different sources of isolates like cat, duck, goose, stone marten, whooper and swan, these were from Germany. It was concluded that same influenza virus was spread in entire host. All other three clades showed same sort of proximity (Fig.1 and Fig.2).
Figure 1
Figure 2
Both trees showed good resemblance in the clades result. It concludes that this investigation will be helpful for knowing the taxonomy and evolution of newer influenza virus. A study was reported with comparison between antigenically distinct variants of dengue virus in that relationship between phylogenetic shows that DENV-1 was more homologous to DENV-3 as compared with other serotypes (Somvanshi and Seth, 2007).
The computational analysis of physicochemical properties of all proteins of influenza virus was done. The molecular weight and pI value indicates the stability of protein in that particular isoelectric point (pI). The prokaryotic and eukaryotic subcellular localization of protein were well documented but the virus subcellular localization prediction was not known. An online server (Virus-Ploc) was used to predict the virus protein localization within virus or targeting the host. The analysis of whole proteome and specific location of proteins of influenza virus were given (Table1).
Figure 3
We investigate the extent of utilization of human cellular localization mechanisms by viral proteins and that appropriate subcellular localization predictors can be used to predict viral protein localization within the host cell. Motifs are typically 6-30 amino acids and correspond to the active site, substrate or ligand-binding site and structurally important segment of proteins. Several important motifs were identified in the study (Table2-9).
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
In this study, we identified nine motifs viz. N-glycosylation site, N- myristoylation site, protein kinase C phosphrylation site, casein kinase 2 phosphorylation site, cAMP and cGMP dependent protein kinase phosphorylation site, amidation, cell attachment sequences (RGD), tyrosine kinase, and prokaryotic membrane lipoprotein lipid attachment site, these motifs play important role in the function of proteome in influenza virus. cAMP is a second messenger used for intracellular signal transduction, such as transferring the effects of hormones like glucagon and adrenaline, which cannot get through the cell membrane. Its main purpose is the activation of protein kinases, also use to regulate passage of Ca2+ through ion channels (Francis and Corbin, 1999). Tyrosine kinase is an enzyme that transfers a phosphate group from ATP to a tyrosine residue in a protein. Phosphorylation of proteins by kinases is an important mechanism in signal transduction for regulation of enzyme activity. The Casein kinase 2 is a serine/threonine-selective protein kinase that is a tetramer of two alpha subunits and two beta subunits. The alpha subunits have the catalytic kinase domain. It has been implicated in cell cycle control, DNA repair, regulation of the circadian rhythm and other cellular processes (Burnett and Kennedy 1954; Bingham and Farrel 1974). Protein kinase C is a family of protein kinases consisting of ~10 isozymes, are activated through the same signal transduction pathway as phospholipase C. It is a family of related serine/threonine kinase and plays a key role in cellular responses such as neurotransmission, gene expression, and cell growth and differentiation (Nishizuka 1984). Myristoylation is an irreversible, post-translational protein modification found in animal, plant, fungi and viruses. It also occurs post-translationally viz. when previously internal glycine residues become exposed by caspase cleavage during apoptosis (Podell and Gribskov 2004). Important motif amidation is a required post-translational modification for the bioactivation of many neuropeptides, is catalyzed by a bifunctional enzyme, peptidyl glycine- amidating monooxygenase, in a two-step reaction (Stewart and Klinman, 1988). RGD motif also play role in the host cell attachment. Recently, Tamanna et al (2006) identified the several motifs (N-glycosylation site, N- myristoylation site, protein kinase C phosphrylation site, casein kinase 2 phosphorylation site, amidation, tyrosine kinase) in the three proteins of influenza virus like nucleoprotein, hemagglutinin and neuraminidase. It has been reported that several host cow, chicken, goose and human specific epitopes from the two significant surface proteins hemagglutinin and neuraminidase of Influenza A virus H5N1 for immunodiagnostic and vaccine development (Somvanshi et al 2007). However, this study may help to understand the whole proteome function, gene regulation and furthermore supports vaccine and antiviral drug target to inhibit the functioning of influenza at the specific position of predicted motifs.