ISPUB.com / IJGP/5/2/11658
  • Author/Editor Login
  • Registration
  • Facebook
  • Google Plus

ISPUB.com

Internet
Scientific
Publications

  • Home
  • Journals
  • Latest Articles
  • Disclaimers
  • Article Submissions
  • Contact
  • Help
  • The Internet Journal of Genomics and Proteomics
  • Volume 5
  • Number 2

Original Article

Molecular Phylogeny of Y-Box Proteins and their Cold Shock Domains

A Mani, D Gupta

Keywords

evolution, molecular phylogeny, y-box protein.

Citation

A Mani, D Gupta. Molecular Phylogeny of Y-Box Proteins and their Cold Shock Domains. The Internet Journal of Genomics and Proteomics. 2009 Volume 5 Number 2.

Abstract


Purpose: Y-box proteins are a family of highly conserved nucleic acid binding proteins that are conserved from prokaryotes to human. They are supposed to be involved in both transcriptional and translational control. Little is understood about their evolution end genomic diversity among different taxa. The study was performed by combining Bioinformatic and phylogenetic approaches in order to address first cross family evolution of Y-box proteins from different eukaryotic organisms.
Methods: For sequence analysis, inferring the phylogenetic tree and evolutionary characterization the protein sequences of Y box proteins from 17 different species were used. The tree was drawn by Neighbour-joining method while using the bootsrapping as a test of inferred phylogeny.Result: The phylogenetic trees were constructed from multiple aligned sequences showing bootstrap values on nodes and species codes on leaves. The analysis of data led to a single most consistent tree. Discussion and conclusion: The results endow with an excellent perception about the evolutionary order of Y-box proteins. The study established an overall framework of information for the family of Y-box proteins, which may facilitate and stimulate the study of this gene family across all organisms.

 

Introduction

The Y-box proteins are the most evolutionarily conserved nucleic acid-binding proteins yet described, found in bacteria, plants and animals. All vertebrate Y-box proteins contain a variable N-terminal domain, a Cold Shock Domain, and a C-terminal tail domain. The CSD is a highly conserved nucleic acid binding domain that confers RNA- and single stranded and double stranded DNA binding activities to the Y-box proteins. The most extensively studied Y-box protein, YB-1 has been found to be upregulated during prostate cancer tumor expression [1].Increased YB-1 expression has been correlated with DNA toposisomerase II α and proliferating cell nuclear antigen expression in human lung cacer and colorectal cancer [2] and linked to markers of cellular proliferation in osteosarcoma [3]. YB-1 has been identified as cell cycle stage specific transcription factor [4]. They have ability to bind Y-box elements [5, 6, 7, 8]. The eukaryotic Y-box proteins were originally identified through their ability to interact with DNA containing a reverse CCAAT box, the Y-box sequence CTGATTGGCCAA [9].This sequence is found in a variety of promoter regions, including those of the MHC class II genes [9, 10] and genes encoding germ cell-specific functions [5] and in these contexts the Y-box proteins are considered to act as regulators of transcription. Although many Y-box proteins function to regulate transcription and some demonstrate RNA specificity [11], others have an mRNA stabilization role in the cytoplasm [12]. They have been implicated in various cellular processes, including adaptation to low temperatures, cellular growth, nutrient stress and stationary phase [13]. The discovery of a domain, the cold-shock domain that shows strikingly high homology and similar RNA-binding properties to CSPs in a growing number of eukaryotic nucleic-acid-binding proteins suggests that these proteins have an ancient origin.

Despite of being highly conserved and having major role in transcrpiption and translation as well as their nucleic acid binding properties, little is understood about the evolution and genomic diversity of Y-box proteins among different taxa. Objective of this study was to evolutionarily characterize these proteins in higher eukaryotic organisms as well as to carry out comparative analysis of these organisms based on Y-box protein sequences. We also attempted to find out whether these proteins are conserved enough to represent evolutionary order as inferred by using complete genome.

Materials and Method

In order to search Y-box protein family members BLAST [14] was performed by using blastp program in the protein database at NCBI [15]. Mus musculus y-box protein’s gi|2745892|gb|AAB94768.1| amino acid sequence was selected as query. From the hits 17 sequences (Table 1) each from different species were selected for further studies. All the sequences were taken in FASTA format. The sequences were examined individually and aligned using ClustalW [16]. Bioedit version 7.0.9.0 [17] was used for manual editing and analysis of sequences. Kyte J and Doolittle [18] method was used to plot hydrophobicity profile. Entropy was calculated as

Figure 1

where H(l) = the uncertainty, also called entropy at position l, b represents a residue (out of the allowed choices for the sequence in question), and f(b,l) is the frequency at which residue b is found at position l. The information content of a position l, then, is defined as a decrease in uncertainty or entropy at that position. A window of defined size that was 13 is moved along a sequence, the hydropathy scores were summed along the window, and the average (the sum divided by the window size) was taken for each position in the sequence. The mean hydrophobicity value was plotted for the middle residue of the window. Eisenberg et. al. method [19] was used to plot hydrophobic moment profile with a window size of 13 residues having six residues on either side of the current residue and rotation angle, θ =100 degrees.

Figure 2

Where µH is the hydrophobic moment, Hn is the hydrophobicity score of the residue H at position n, δ=100 degrees, n is position within the segment, and each hydrophobic moment is summed over a segment of the same defined window length.

For a conserved region search within the multiple aligned sequences minimum segment length was set to 15 residues, maximum average entropy was set to be 0.4 and the gaps were limited to 2 per segment. Multiple sequence alignment, phylogenetic and molecular evolutionary analyses were conducted using MEGA version 4 [20]. For pair wise and multiple alignment gap open penalty was -7 and gap extension penalty was -1 [21]. BLOSUM weight matrix was used for substitution scoring [22]. Hydrophilic gap penalties were used to increase the chances of a gap within a run (5 or more residues) of hydrophilic amino acids; these are likely to be loop or random coil regions where gaps are more common. The multiple alignmets of sequences of Y-box proteins and cold shock domain were used to create phylogenetic trees. The evolutionary history was inferred using the Neighbour-Joining method [23]. All the characters were given equal weights. The bootstrap consensus tree inferred from 10000 replicates was taken to represent the evolutionary history of the taxa analyzed [24]. Branches corresponding to partitions reproduced in less than 50% bootstrap replicates were collapsed. The percentage of replicate trees in which the associated taxa clustered together in the bootstrap test (1000 replicates) were shown next to the branches [25]. The tree was drawn to scale, with branch lengths in the same units as those of the evolutionary distances used to infer the phylogenetic tree. The evolutionary distances were computed using the poisson correction method and are in the units of the number of amino acid substitutions per site [25]. All positions containing gaps and missing data were eliminated from the dataset (Complete deletion option). There were a total of 158 positions in the Y-box proteins’ final dataset, out of which 87 were parsimony informative. Cold shock domains final dataset was having 72 positions out of which 16 were parsimony informative. Phylogenetic analyses were conducted in MEGA4 [20].

Figure 3
Table 1: Y-box protein sequences with their length and NCBI accession code

Results and discussion

Multiple sequence alignment

The Multiple alignment of cold Y-box protein sequences (Fig. 1) resulted into an alignment having 409 positions. The Multiple alignment of cold shock domains (Fig. 2), which vary in the Y box proteins from position from 93 to 164, resulted into alignment with 72 positions suggesting their high conservedness. By statistical analysis of multiple aligned sequences it was observed that glycine, arginine, proline, alanine, glutamine and valine are the most frequently present amino acids with frequency percentage of 12.76, 12.03, 9.52, 8.21, 8.21 and 5.76 respectively. While within conserved sites glycine, valine, alanine, asparagine, glutamine, lysine and serine are the most frequently occuring amino acids with frequency percentage of 19.13, 14.87, 9.25, 8.50, 8.38, 6.37, 6.37 and 5.12 respectively. These Y- box proteins contain a cold shock domain which is 70 or 71 residues long and highly conserved. It was observed that within the cold shock domain valine, glycine, asparagines, lysine, glutamine and alanine are the most frequently occurring amino acids with frequency percentage of 14.44, 11.37, 9.54, 9.38, 8.30 and 6.06 respectively. The multiple aligned sequence of Y-box protein was found with No. of conserved sites=54, No. of parsimony informative sites= 230 and No. of singleton sites= 84. While the multiple aligned sequence of cold shock domain of the protein was found with No. of conserved sites=40, No. of parsimony informative sites= 16 and No. of singleton sites= 15.

Figure 4
Figure 1a:Multiple sequence alignment of Y-box proteins(Position 1-150)

Figure 5
Figure 1b: Multiple sequence alignment of Y-box proteins (Position 151-300)

Figure 6
Figure 1c: Multiple sequence alignment of Y-box proteins (Position 301 to end)

Figure 7
Figure 2: Multiple sequence alignment of cold shock domains (Position 1 to 7 end)

Conserved region search

A conserve region search resulted into three regions (Fig. 3a, 3b and 3c) from position 95 to 126 (segment length = 32), 128 to 153 (segment length = 26) and 156 to 180 (segment length = 25) with an average entropy of 0.2151, 0.2142 and 0.3942 respectively. This conservation has already been upheld by minimal entropy shown by positions 95 to 180.

Figure 8
Figure 3a: The first conserved domain found in Y-box proteins

Figure 9
Figure 3b: The second conserved domain found in Y-box proteins

Figure 10
Figure 3c: The third conserved domain found in Y-box proteins

Entropy plot

An entropy plot, measure of the lack of the information content and the amount of variability, was generated for all the aligned positions. The plot shows that entropy rarely touches a scale of two, showing minimal entropy at several positions from position 96 to position 170 where entropy rarely crosses a scale of one, which is a sign of better alignment in the region.

Figure 11
Figure 4: Entropy (Hx) Plot for aligned Y-box protein sequences

Hydrophobicity profile

A hydrophobicity profile plot shows that mean hydrophobicity of the protein for most of the positions is in all the species is below zero, occassionaly it turns to be positive. Maximum hydrophobicity is observed from positions 10 to 100 for Danio rerio and Columba livia and between 140 to 180 positions. Schistosoma japonicum exhibits high increase in hydrophobicity from position 340 to 360 and 380 to 410. These proteins are basically of non-hydrophobic in nature as most of the positions are across show a below mean hydrophobicity in the case of most of the organisms studied here.

Figure 12
Figure 5a: Kyte and Dolittle scale mean hydrophobicity profile plot

Figure 13
Figure 5b: Colour legend for Kyte and Dolittle scale mean hydrophobicity profile plot

Phylogeney

The phylogenetic tree were constructed by using Neighbour –joining method (Figure 6 and figure7) .The tree shows different organisms on tree nodes branched on the basis of their Y-box proteins. Schistosoma japonicum makes a totally diverged branch from the main tree among 17 selected proteins. Node for Endopterygotans (Chironomus tetanus, Bombyx mori, Drosophila melanogaster and Aedes aegypti) is supported by lower bootstrap values i.e. 68% while the node for vertebrates is supported by very high bootstrap value i.e. 100%. Node for Teleosteians (Carassius auratus, Danio rerio and Oryzus laetipus) and mammals’ node is supported by 99% bootstrap value. Node for mammals is supported by 100% bootstrap value; the only exception is Xenopus tropicalis which is an amphibian. These trees give an idea about the evolutionary order of Y- box proteins. The tree inferred from the cold shock domains (Figure 7) does not show overall consistency with that of whole Y- box tree but gives an idea about their different evolutionary trend. Despite of highly conserved nature of Y-box proteins and cold shock domains this phylogeny does not seem to be completely consistent with the established taxonomy perhaps due to use of a specific protein rather than complete genomes.

Figure 14
Figure 6: Bootstrap consensus phylogenetic tree of Y-box proteins created by Neighbour- joining method showing bootstrap support values on the nodes.

Figure 15
Figure 7: Bootstrap consensus phylogenetic tree of Y-box proteins’ cold shock domains created by Neighbour- joining method showing bootstrap support values on the nodes.

Though both the trees (Figure 6 and 7) are somewhat similar to the established taxonomy of the organisms but neither the tree made by using whole Y-box protein nor the tree made by only cold shock domain completely comply with established evolutionary order. This inconsistency is higher while using whole Y-box protein rather than only cold shock domain perhaps due to presence of variable N-terminal and C-terminal domains in the protein. Use of only cold shock domain decreases the difference between the order of organisms on the nodes of the phylogenetic tree inferred by NJ method and the tree of established taxonomy since variable N-terminal and C-terminal domains were omitted and inference was based only on the conserved cold shock domain. Molecular phylogeny of various proteins like HSP 90 gene family [26] and coronin gene family [27] have also been inferred with the same inconsistencies, though these studies including the present study furnish an idea about the evolutionary trend in milieu of the protein studied. From figure 7 it can also observed that the within chordates (Birds and mammals) there is greater level of variance in the protein sequence while non chordates, mostly insects, have more conserved Y-box proteins. It appears that the cold shock proteins of prokaryotes which evolved by adding N and C terminal domains to perform diverse functions in eukaryotes by ascending as cold shock domains were probably given selective advantage during the course of evolution for those changes which increased their efficiency for performing more functions which in turn probably increased variations in the amino acid sequences. These variations have become more prevalent among higher organisms with reference to Y-box proteins.

This study presents first comparative proteomic study and evolutionary analysis of the Y-box family of proteins based on molecular phylogeny across different families of 17 eukaryotic organisms. The results endow with an excellent perception about the evolutionary order of Y-box proteins. The study established an overall framework of information for the family of Y-box proteins, which may facilitate and stimulate the study of this gene family across all organisms.

Acknowledgements

AM is thankful to University Grants Commission, New Delhi for a research fellowship. The work has been supported by a DBT-BIF Grant to DKG under its BTISNet scheme.

References

1. Gimenez-Bonafe, P., Fedoruk, M.N., Whitmore, T.G. Akbari, M., Ralph, J., Ettinger, S., Gleave and M., Nelson C. (2004). YB-1 is upregulated during prostate cancer tumor progression and increases P-glycoprotein activity. Prostate 59,337-349.
2. Gu C., Oyama T, Osaki T, Kohno K. and Yamamoto K. (2001). Expression of Y box-binding protein-1 c topoisomerase II alpha and proliferating cell nuclear antigen expression in lung cancer. Anticancer Res. 21, 2357-2362.
3. Oda, Y., Sakamoto, A., Shinohara, N., Ohga, T., Uchiumi, T., Kohno, K., Tsuneyoshi, M., Kuwano, M. and Iwamoto, Y. (1998). Nuclear expression of YB-1 protein correlates with P-glycoprotein expression in human osteosarcoma. Clin. Cancer Res. 4, 2273-2277.
4. Jürchott, K., Bergmann S., Stein, U., Walther, W., Janj, M., Manni, I., Piaggio, G., Fietze, E., Dietel, M., and Royer H.D. (2003).YB-1 as a cell cycle-regulated transcription factor facilitating cyclin A and cyclin B1 gene expression. J. Biol. Chem. 278, 27988-27996.
5. Wolffe, A.P., Tafuri,S., Ranjan,M. and Familiari, M.(1992).The Y-box factors: a family of nucleic acid binding proteins conserved from Escherichia coli to man. New Biol. 4, 290–298.
6. Wolffe, A.P. (1994).Structural and functional properties of the evolutionarily ancient Y-box family of nucleic acid binding proteins. BioEssays. 16, 245–251.
7. Ladomery, M., and Sommerville, J. (1994).Binding of Y-Box proteins to RNA: involvement of different protein domains. Nucleic Acid Res. 22(25), 5582-5589.
8. Ladomery, M. and Sommerville, J. (1996). Masking of mRNA by Y-box proteins. The FASEB Journal. 10, 435-443.
9. Didier, D. K., J. Schiffenbauer, S. L. Woulfe, M. Zacheis, and B. D. Schwartz. (1988). Characterization of the cDNA encoding a protein binding to the major histocompatibility complex class II Y box. Proc. Natl. Acad. Sci. USA 85, 7322-7326
10. Dom, A., Durand, B., Marfing, C., Lemeur, M., Benoist, C., and Mathist, D. (1987). Conserved major histocompatibility complex class II boxes-X and Y- are transcriptional control elements and specifically bind nuclear proteins. Proc. Natl. Acad. Sci. USA, 84, 6249-6253.
11. Bouvet, P., Matsumoto, K. and Wolffe, A. P. (1995) Sequence specific RNA Recognition by Xenopus Y-box Proteins. J. Biol. Chem. 270, 28297–28303.
12. Sommerville, J. (1999). Activities of cold-shock domain proteins in translational control. BioEssays 21, 319–325.
13. Peter L. Graumann and Mohamed A. Marahiel. (1998) A superfamily of proteins that contain the cold-shock domain. Trends Biochem. Sci. 8,286-90.
14. Altschul, S. F., Thomas L. Madden, Alejandro A. Schäffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman.(1997). "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25, 3389-3402.
15. www.ncbi.nlm.nih.gov/entrez (National Centre for Biotechnology Information).
16. Higgins, D., Thompson, J., Gibson, T. Thompson J. D., Higgins D. G., Gibson T. J. (1994).CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22, 4673-4680.
17. Hall, T.A. (1999). BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucl. Acids. Symp. Ser. 41, 95-98.
18. Kyte J and Doolittle, RF (1982). A simple method for displaying the hydropathic character of a protien. J Mol Biol 157,105-142.
19. Eisenberg D. E. Schwarz, M. Komaromy and R.Wall. (1984). Analysis of membrane and surface protein sequences with the hydrophobic moment plot. J. Mol. Biol. 179(1), 125-42.
20. Tamura, K., Dudley J, Nei, M. & Kumar, S. (2007). MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0. Molecular Biology and Evolution 24, 1596-1599.
21. Altschul S.F. and Gish G. (1996) Local alignment statistics. Methods Enzymol. 266,460-480.
22. Henikoff S. and Henikoff J. (1992) Amino acid substitution matrices from protein blocks. Proceedings of the National Academy of Sciences USA , 89, 10915-10919.
23. Saitou, N. & Nei, M. (1987).The neighbor-joining method: A new method for reconstructing phylogenetic trees. Molecular Biology and Evolution 4,406-425.
24. Felsenstein, J. (1985).Confidence limits on phylogenies: An approach using the bootstrap. Evolution 39,783-791.
25. Zuckerkandl, E. and Pauling, L. (1965) Evolutionary divergence and convergence in proteins. Evolving Genes and Proteins. Academic Press, New York.
26. Chen,B., Zhong, D., and Monteiro, A., (2006) Comparative genomics and evolution of the HSP90 family of genes across all kingdom of organisms.BMC Genomics. 7;156, 1471-2164.
27. Morgan RO, Fernandez MP. Molecular phylogeny and evolution of the coronin gene family. Subcell Biochem. 2008;48:41-55.

Author Information

Ashutosh Mani, M.Sc.
Center of Bioinformatics, Institute for Interdisciplinary Studies, University of Allahabad

Dwijendra K. Gupta, Ph.D.
Center of Bioinformatics, Institute for Interdisciplinary Studies, University of Allahabad

Download PDF

Your free access to ISPUB is funded by the following advertisements:

 

 

BACK TO TOP
  • Facebook
  • Google Plus

© 2013 Internet Scientific Publications, LLC. All rights reserved.    UBM Medica Network Privacy Policy

Close

Enter the site

Login

Password

Remember me

Forgot password?

Login

SIGN IN AS A USER

Use your account on the social network Facebook, to create a profile on BusinessPress