Assessing the precision of high-throughput computational approaches for the genome-wide subcellular localization of putative proteins from Vibrio cholerae
P Somvanshi, V Singh, P Seth
Keywords
cholera, drugs, pathogenesis, subcellular localization
Citation
P Somvanshi, V Singh, P Seth. Assessing the precision of high-throughput computational approaches for the genome-wide subcellular localization of putative proteins from Vibrio cholerae . The Internet Journal of Genomics and Proteomics. 2007 Volume 3 Number 2.
Abstract
Introduction
Worldwide 1.3 billion cases of acute diarrhea occur in children below 5 years annually of which more than 3 million die and 80 per cent of these deaths are in children below 2 years of age (Sur and Bhattacharya 2003). Acute diarrhea was caused by various numbers of bacterial, viral or parasitic agents. The most important bacterial agents causing outbreaks of acute diarrhoea are
The computational prediction of the subcellular localization of bacterial proteins is an important step in genome annotation and in the search for novel vaccine or drug targets.
Computational SCL investigation of the growing number of complete bacterial genomes or individual proteins allow researchers to screen for vaccine/drug candidates, automatically annotate gene products or select proteins for further study. The pathogenicity of
Materials And Methods
Collection of sequences
The complete nucleotide and protein sequences were extracted from biological database National Centre for Biotechnology Information (NCBI) cited at http://www.ncbi.nlm.nih.gov
Analysis of physico chemical properties
The physico-chemical properties of proteins were analyzed viz. total number of amino acids, molecular weight and isoeletric point with Generunner, DNAstar and ExPaSy tools.
Prediction of sub cellular localization of proteins
Total 1302 bacterial proteins have been used to develop the PSLpred tool. The five localization and numbers of proteins (248 cytoplasmic, 268 inner membrane, 244 periplasmic, 352 outer membrane and 190 extracellular) have been included. Machine-learning technique, SVM, has been used for the prediction of subcellular localizations of prokaryotic proteins. The prediction of subcellular localizations is a multi-class classification problem. The performance of the SVM modules developed in the present study was evaluated through 5-fold cross-validation technique. In this technique, the relevant dataset is partitioned randomly into five equal sized sets. The training and testing was carried out five times, using one distinct set for each testing and the remaining (four sets) for the training. In order to assess the predictive performance, accuracy and Matthew's correlation coefficient (MCC) (Matthews, 1975) have been calculated (Bhasin et al 2005).
Results And Discussion
In this study we had selected fifty two putative protein of V. cholerae O1 and their physico chemical nature was analyzed theoretically. The molecular weight and isoelectric point of all these proteins was deduced (Table 1).
Figure 1
It indicates protein stability in a particular isoelectric point (pI). An online PSLpred server was used to predict protein subcellular localization within bacteria or targeting the host. We investigate putative proteome of
In previous studies thirty-nine putative proteins of
In conclusion, we include the specified prediction of subcellular localization results in the most putative proteins of strain of