bioinformatics, blast, cupin family, hypothetical protein 3m3i, leishmania major, nucleotide metabolism, string
J Watkins. Predicting The Function Of A Putative Uncharacterized Hypothetical Protein From Leishmania Major. The Internet Journal of Genomics and Proteomics. 2012 Volume 6 Number 2.
3M3I, a hypothetical protein from Leishmania Major was selected for functional studies using the Basic Local Alignment Search Tool (BLAST) to perform a similarity search through the National Center for Biotechnology Information (NCBI). Comparative analysis disclosed matches to the cupin family and purine nucleotide permease. To further elucidate the identity of 3M3I the STRING database was employed. STRING revealed protein interactions with glyoxalase II and adenosine deaminase. Based on the BLAST and STRING identifications 3M3I appears to be involved in nucleotide metabolism. In addition, analysis revealed possible participation in ATP-dependent protease activity and shuttling within metabolic pathways. This data has laid foundations for further studies to be accomplished linking 3M3I to the enzyme system responsible for the renewal of nucleic acids in Leishmania Major.
Proteomics can be defined as the study of global protein expression of biological samples . This area of study employs qualitative and quantitative mass spectrometry approaches to examine global sample sets, which have provided a wealth of knowledge in the understanding of cellular function [3-6]. One of the key outcomes of global proteomic analysis is the identification of new proteins with unknown function. These hypothetical proteins are those having a predicted existence but no experimental data to define their function . In order for a complete analysis and understanding of any proteome to take place we must unravel all pieces of the puzzle.
In this paper we address the prediction of a hypothetical protein, 3M3I, from Leishmania Major. Elucidating the function and family classification of hypothetical proteins without experimental data involves the combined effort of a number of strategies [8,9]. These include, (1) identification of the sequence homology to proteins of known function, (2) phylogenetic analysis to show evolutionary relationships between biological species based upon similarities and differences in their DNA/protein sequence, and (3) comparative analysis for the identification of probable protein-protein interactions. One of the key points in identifying the function of hypothetical proteins will be to identify the domain(s) found and their organization in order to be able to classify these proteins into a family .
Materials and Methods
A search of hypothetical proteins was performed at the Protein Data Bank (PDB) (http://www.rcsb.org/pdb) and a hypothetical protein of unknown function, 3M3I (225 residues in length), of Leishmania major was selected.
The sequence of 3M3I was extracted in FASTA format from PDB. Through the National Center for Biotechnology Information (NCBI) (http://www.ncbi.nlm.nih.gov), a similarity search using the Basic Local Alignment Search Tool program, BLASTp, a protein-to-protein pair wise alignment was performed.
Predicted Protein-Protein Interaction
A search was carried out using the protein sequence of 3M3I in the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING 9.0). The STRING 9.0 database allowed for the identification of predicted and known protein interactions, encompassing physical and functional associations.
Results and Discussion
The BLASTp tool found at NCBI identified 100 sequences that shared similarities to hypothetical protein 3M3I. A Max ID cut off of 70% was set identifying the top 6 proteins, (Table I), each representing a hypothetical protein with unknown function found amongst
Based on the searches obtained in Table II, it was clear that many residues identical to 3M3I differed substantially. However NCBI Blast searches revealed matches to a cupin family domain containing protein and two a purine nucleoside permease. A PDB search of the 3M3I sequence revealed a match to protein 1YUD_A, a chain A X-Ray crystal structure of protein So0799 from
The Cupin Family of proteins were named based on the conservation of a beta-barrel fold, and the Cupins span archea, bacteria and eukaryotes . Although this family of proteins is conserved across these groups there happens to be little sequence homology except for that seen for short partially conserved regions where length and residue sequence vary . The cupins are functionally diverse and consists of enzymatic and non-enzymatic members, which may have one or two cupin domains. They are characterized at the biochemical level, however many are recognized only from gene cloning or sequencing projects .
Leishmania, Trypansoma and Plasmodia are parasitic protozoa. All three of theses protozoa are incapable of making purines de novo and are completely dependent on their surrogate hosts for all preformed purines . This is a key biochemical distinction between the vertebrate host and the parasite, as all vertebrates have an established purine biosynthesis mechanism. In order for the parasite to overcome this evolutionary gap they have devised a salvage system encompassing transporters that will allow the intake of an array of purine nucleoside and nucleobases in order to convert them to essential purine metabolites . The Leishmania group of nucleoside and nucleobase transporters are considered to be high affinity concentrative protein-coupled permeases that allow them to compete with the host for components of purine biosynthesis effectively [14, 15,16].
The NCBI search identified matches to purine nucleoside permease (PNP). This identifies the 3M3I protein as a possible cell membrane protein that functions as a mode for the transport of components involved in purine metabolism in or out of the cell. To assist in providing additional information that this protein may be a PNP, the proteomics software tool STRING 9.0 (http://string-db.org/) found at the bioinformatics resource portal ExPASy was utilized. STRING is a database of predicted and known protein interactions, encompassing physical and functional associations. These predictions originate from high-throughput experiments, genomic context, (conserved) co-expression, and PubMed text mining. STRING searched for proteins similar to the 3M3I sequence and identified a total of 20 sequences (Table III).
The top match was to hypothetical protein LinJ35.1130 from Leishmania infantum with an e-value score of 1e-122. This match was then utilized to identify functional partners. In total there were four partners identified, (Figure 1). This data implicates 3M3I as a participant in nucleotide metabolism and metabolic shuttling. The 3M3I protein has a predicted interaction with an enzyme, glyoxalase II which is involved in initiating excision repair at various sites of damaged or improper bases in DNA. In addition a possible involvement in nucleotide metabolism is revealed through an interaction with adenosine deaminase. NADH dehydrogenase has a predicted interaction identifying that 3M3I may participate in shuttling within metabolic pathways. Finally 3M3I may interact with the hs1vu complex proteolytic subunit-like revealing a possible connection in ATP-dependent protease activity.
To determine how conservative these sequence hits were within and outside the species, an occurrence overview was carried out (Figure 2). It is important to note that all four partners are conserved within Leishmania, Trypansoma and Plasmodia, however the LinJ35.1130 protein which had the closest match to our 3M3I protein in the STRING search is only found in Leishmania, identifying the need of a specialized transporter wihin the nucleotide metabolism pathway unique to this species. The second hit, adenosine deaminase (ADA), is involved in nucleotide metabolism, specifically purine metabolism. ADA irreversibly deaminates adenosine, converting it to the related nucleoside inosine by the substitution of the amino group for a hydroxyl group within the purine metabolism pathway.
This data provides evidence that 3M3I has possible involvement in nucleotide metabolism, however this data is superficial at best and will have to be confirmed. Further studies within Leishmania Major will be required to confirm if and how 3M3I is involved in the continuous cellular renewal of nucleic acids. Data will have to be produced on the enzyme systems participating and 3M3I inclusion in nucleotide renewal. The focus of these studies will elucidate the presence, concentration and interaction of primary and intermediate products in collaboration with 3M3I in nucleotide metabolism.
I would like to thank Mr. Jerry Watkins and Dr. Damon Love for critical review of the manuscript.