Molecular Phylogeny of Y-Box Proteins and their Cold Shock Domains
A Mani, D Gupta
Keywords
evolution, molecular phylogeny, y-box protein.
Citation
A Mani, D Gupta. Molecular Phylogeny of Y-Box Proteins and their Cold Shock Domains. The Internet Journal of Genomics and Proteomics. 2009 Volume 5 Number 2.
Abstract
Introduction
The Y-box proteins are the most evolutionarily conserved nucleic acid-binding proteins yet described, found in bacteria, plants and animals. All vertebrate Y-box proteins contain a variable N-terminal domain, a Cold Shock Domain, and a C-terminal tail domain. The CSD is a highly conserved nucleic acid binding domain that confers RNA- and single stranded and double stranded DNA binding activities to the Y-box proteins. The most extensively studied Y-box protein, YB-1 has been found to be upregulated during prostate cancer tumor expression [
Despite of being highly conserved and having major role in transcrpiption and translation as well as their nucleic acid binding properties, little is understood about the evolution and genomic diversity of Y-box proteins among different taxa. Objective of this study was to evolutionarily characterize these proteins in higher eukaryotic organisms as well as to carry out comparative analysis of these organisms based on Y-box protein sequences. We also attempted to find out whether these proteins are conserved enough to represent evolutionary order as inferred by using complete genome.
Materials and Method
In order to search Y-box protein family members BLAST [
where
Where
For a conserved region search within the multiple aligned sequences minimum segment length was set to 15 residues, maximum average entropy was set to be 0.4 and the gaps were limited to 2 per segment. Multiple sequence alignment, phylogenetic and molecular evolutionary analyses were conducted using MEGA version 4 [
Results and discussion
Multiple sequence alignment
The Multiple alignment of cold Y-box protein sequences (Fig. 1) resulted into an alignment having 409 positions. The Multiple alignment of cold shock domains (Fig. 2), which vary in the Y box proteins from position from 93 to 164, resulted into alignment with 72 positions suggesting their high conservedness. By statistical analysis of multiple aligned sequences it was observed that glycine, arginine, proline, alanine, glutamine and valine are the most frequently present amino acids with frequency percentage of 12.76, 12.03, 9.52, 8.21, 8.21 and 5.76 respectively. While within conserved sites glycine, valine, alanine, asparagine, glutamine, lysine and serine are the most frequently occuring amino acids with frequency percentage of 19.13, 14.87, 9.25, 8.50, 8.38, 6.37, 6.37 and 5.12 respectively. These Y- box proteins contain a cold shock domain which is 70 or 71 residues long and highly conserved. It was observed that within the cold shock domain valine, glycine, asparagines, lysine, glutamine and alanine are the most frequently occurring amino acids with frequency percentage of 14.44, 11.37, 9.54, 9.38, 8.30 and 6.06 respectively. The multiple aligned sequence of Y-box protein was found with No. of conserved sites=54, No. of parsimony informative sites= 230 and No. of singleton sites= 84. While the multiple aligned sequence of cold shock domain of the protein was found with No. of conserved sites=40, No. of parsimony informative sites= 16 and No. of singleton sites= 15.
Conserved region search
A conserve region search resulted into three regions (Fig. 3a, 3b and 3c) from position 95 to 126 (segment length = 32), 128 to 153 (segment length = 26) and 156 to 180 (segment length = 25) with an average entropy of 0.2151, 0.2142 and 0.3942 respectively. This conservation has already been upheld by minimal entropy shown by positions 95 to 180.
Entropy plot
An entropy plot, measure of the lack of the information content and the amount of variability, was generated for all the aligned positions. The plot shows that entropy rarely touches a scale of two, showing minimal entropy at several positions from position 96 to position 170 where entropy rarely crosses a scale of one, which is a sign of better alignment in the region.
Hydrophobicity profile
A hydrophobicity profile plot shows that mean hydrophobicity of the protein for most of the positions is in all the species is below zero, occassionaly it turns to be positive. Maximum hydrophobicity is observed from positions 10 to 100 for
Phylogeney
The phylogenetic tree were constructed by using Neighbour –joining method (Figure 6 and figure7) .The tree shows different organisms on tree nodes branched on the basis of their Y-box proteins.
Figure 14
Figure 15
Though both the trees (Figure 6 and 7) are somewhat similar to the established taxonomy of the organisms but neither the tree made by using whole Y-box protein nor the tree made by only cold shock domain completely comply with established evolutionary order. This inconsistency is higher while using whole Y-box protein rather than only cold shock domain perhaps due to presence of variable N-terminal and C-terminal domains in the protein. Use of only cold shock domain decreases the difference between the order of organisms on the nodes of the phylogenetic tree inferred by NJ method and the tree of established taxonomy since variable N-terminal and C-terminal domains were omitted and inference was based only on the conserved cold shock domain. Molecular phylogeny of various proteins like HSP 90 gene family [26] and coronin gene family [27] have also been inferred with the same inconsistencies, though these studies including the present study furnish an idea about the evolutionary trend in milieu of the protein studied. From figure 7 it can also observed that the within chordates (Birds and mammals) there is greater level of variance in the protein sequence while non chordates, mostly insects, have more conserved Y-box proteins. It appears that the cold shock proteins of prokaryotes which evolved by adding N and C terminal domains to perform diverse functions in eukaryotes by ascending as cold shock domains were probably given selective advantage during the course of evolution for those changes which increased their efficiency for performing more functions which in turn probably increased variations in the amino acid sequences. These variations have become more prevalent among higher organisms with reference to Y-box proteins.
This study presents first comparative proteomic study and evolutionary analysis of the Y-box family of proteins based on molecular phylogeny across different families of 17 eukaryotic organisms. The results endow with an excellent perception about the evolutionary order of Y-box proteins. The study established an overall framework of information for the family of Y-box proteins, which may facilitate and stimulate the study of this gene family across all organisms.
Acknowledgements
AM is thankful to University Grants Commission, New Delhi for a research fellowship. The work has been supported by a DBT-BIF Grant to DKG under its BTISNet scheme.