Clustering and Classification of Anopheline Spacer Sequences using Self Organizing Maps

Amit Kumar Banerjee; Neelima  Arora; Upadhyayula Suryanarayana Murty

Clustering and Classification of Anopheline Spacer Sequences using Self Organizing Maps

A Banerjee, N Arora, U Murty

Keywords

classification, clustering, its2, mosquito, secondary structure, self organizing map som

Citation

A Banerjee, N Arora, U Murty. Clustering and Classification of Anopheline Spacer Sequences using Self Organizing Maps. The Internet Journal of Genomics and Proteomics. 2008 Volume 4 Number 1.

Abstract

ITS2, a well known phylogenetic marker is widely used in taxonomic studies. This study exploits a novel approach to classify and cluster the Anopheline species based upon their spacer (ITS2) sequences. As secondary structure of ITS2 is crucial for the function, derived parameters based on secondary structure along with sequence composition were considered for this study. Self Organizing Map (SOM), a neural network approach was adopted for classification and clustering of Anopheline sequences. This data mining approach for clustering and classification will aid in unveiling of inherent relationships among the various parameters contributing to ITS2 structure stability.

Introduction

Malaria is the most devastating parasitic disease of human, exacting an estimated toll of 300–500 million new infections and 1.5–3·0 million deaths annually (World Malaria Report 2005). 41% of population lives in endemic regions in 107 countries under constant threat of malaria (World Malaria Report 2005). The completion of triad of mosquito, parasite and human genome fuelled the effort to see malaria in new light and provided much -needed impetus to studies at molecular level (Aultman et al. 2002). For millennia, the reprehensible mosquito has mined the riches of the human bloodstream, with the availability of Anopheles genome; it is the research community's turn to mine the molecular riches of the mosquito (Jasny et al. 2002). The quest to answer myriad questions on Anopheles taxonomy necessitated the search of reliable molecular markers to resolve phylogeny and existing ambiguities. But the sheer complexity and colossal magnitude and pace of availability of molecular data being generated in this post genomic era is often overwhelming and perplexing to molecular entomologists.

Nuclear ribosomal RNA genes (rDNA) are organized in clusters containing the 18S, 5.8S and 28S subunits in eukaryotic organisms. Two internal transcribed spacers (ITS) namely, ITS1, separating 18S and 5.8S genes and ITS2 lying between 5.8S and 28S genes are known to occur (Fedoroff 1979). These spacer sequences are used extensively as reliable markers for taxonomic classification across taxa and exploited for phylogenetic reconstruction by virtue of its fast evolution. (Coleman 2003, Alvarez and Wendel 2003).

Studies focusing on ITS2 find a common place in taxonomic studies more so in case of mosquito genera. ITS2 region has been exploited extensively for differentiating among closely related mosquito species (Crabtree et al.1995, Collins & Paskewitz 1996, Miller et al. 1996, Marinucci et al. 1999, Hackett et al. 2000, Manonmani et al. 2001, Garros et al. 2004). ITS2 are often used to resolve phylogenetic relationships among sibling species of mosquitoes (Collins & Paskewitz 1996, Xu & Qu 1997, Porter and Collins 1991, Walton et al. 1999).

Spacer sequences often projected as the most efficient weapon in arsenal for resolving phylogenetic relationships at different divergence levels (Hillis and Dixon 1991) do suffer from certain shortcomings. Though ITS2 sequences are used to resolve phylogenies at intra-individual, population and interspecies levels yet owing to their high variability, their use is often restricted to closely related species, finding little usage in phylogeny. The divergence in ITS sequences that stems from recombinant and pseudogenic variants often leads to misleading results and hence, reliance on ITS sequence only can prove costly making this marker a double edged sword. Role of secondary conformations of the ITS regions in defining the cleavage sites to release the ribosomal genes during the maturation process is a well known phenomenon. If not more, secondary structure of RNA is as important as the sequence for the function. ITS2 Secondary structure prediction serves in furnishing additional information for phylogenic inferences and differentiating in functional and pseudogenic ITSs (Wesson et al. 1992).

Although the tertiary structure of a functional RNA molecule is crucial determinant of its function, but prediction of its three dimensional structure from the sequence is difficult and cumbersome. However, the secondary structure is known to be conserved in functional RNAs and important to the function of the RNA Secondary structure models can be used for improving alignments at higher systematic levels even with strongly divergent regions such as the ITS, and the framework dictated by the secondary structure is considered as a tool for expanding the preliminary molecular phylogenies. Hence, the secondary structure is usually considered a sufficient approximation of the tertiary structure and several methods for predicting the secondary structures have been developed and implemented.

Since ITS2 secondary structure of numerous eukaryotes has been elucidated in the recent past (Joseph et al. 1999), presence of same overall secondary structure in eukaryotic groups can not be ruled out (Coleman and Vacquier 2002, Mai and Coleman 1997, Michot et al. 1999). Among various parameters known to stabilize the RNA secondary structure, structural energy holds prime importance. This study is an attempt to cluster the sequences based on the inherent information and visualize valuable interrelation among different inherent features of sequence and secondary structure of ITS2 of the Anopheline species.

Materials And Methods

Data collection and data set preparation

ITS2 sequences of Anopheline species were retrieved from NCBI. Sequences were checked for redundancy and filtered. The final curated input dataset contains a total 123 sequences.

Among the parameters known to contribute towards RNA secondary structure stability, the parameters considered for this study are listed in table1:

Figure 1

Secondary structure prediction

RNA secondary structure consists of stems and loops. Mainly five types of loops are present in RNA secondary structure, namely, interior, hairpin, exterior, multi and bulge. For in depth analysis, calculation of secondary structure and determination of structural conservation is essential.

RNA folds analysis

Probable target accessibility (loops) was determined using Sfold (statistical Folding and Rational Design of Nucleic Acids) in the Sribo program based on statistical sample of Boltzmann ensemble for secondary structures. Different loops generated from ITS 2 data using Sribo Program were calculated. (fig.1)

Structural energy calculation

Structural energy seems to be most important factor influencing the structural stability. The secondary structure with the lowest possible free energy value, the minimum free energy (MFE) structure, is predicted to be the most stable secondary structure for the strand. Among the sub-optimal structures calculated by Sribo program, lowest energy holding stable structures were considered and utilized for data mining analysis to interpret the influence of different factors on secondary structure stabilization.

GC content calculation

GC content is known to influence structural energy. GC percentage was determined using GC calculator (http://www.genomicsplace.com/gc_calc.html). All non-DNA characters except N were stripped before computing.

Besides the above mentioned parameters, other features like total bases were calculated manually.

Data mining analysis

We are living in a data rich information poor world where the magnitude of data generated from the high through put methods is overwhelming; Data mining opens a new window of opportunity in this arena. In the present study, data mining approach was utilized to find out the concealed information inherent in the sequence that finally affects the structural stabilization.

Self organizing Maps (SOM): Artificial Neural Networks (ANNs) is an abstract simulation of a real nervous system that contains a collection of neuron units communicating with each other via axon connections.

In SOM, neurons compete with each other to earn the right of representing the input data (Kohonen 2001). As a result, data in the multidimensional attribute space can be abstracted to a much smaller number of latent dimensions organized on a basis of a predefined geometry in a space of lower dimensionality, usually a regular two-dimensional array of neurons. By this way the structures embedded in the input data can be revealed which is placed in the input space and is spanned over the inputs distribution. Using a SOM network, it is possible to obtain a map of input space where closeness between units or clusters in the map represents closeness of the input data. Processing units in the SOM lattice is associated with weights of the same dimension of the input data. Using the weights of each processing unit as a set of coordinates the lattice can be positioned in the input space. During the learning stage the weights of the units change their position and “move” towards the input points. This “movement” becomes slower and at the end of the learning stage, the network is “frozen” in the input space. After the learning stage the inputs can be associated to the nearest network unit. When the map is visualized, the inputs can be associated to each cell on the map. One or more cell that clearly contains similar objects can be considered as a cluster on the map. These clusters are generated during the learning phase without any other information. It is not necessary to supply to the network cluster prototypes or examples. SOMs cluster the data in a manner similar to cluster analysis, but have an additional benefit of ordering the clusters and enabling the visualization of large numbers of clusters. These clusters are arranged in a low-dimensional topology-usually a grid structure that preserves the neighborhood relations in the high dimensional data (Kohonen T 1982, Nurnberger A. and Detyniecki 2002, Cuadros-Vargas et al. 2003). The characteristic that distinguishes the SOM net from the other classification algorithms is that not only similar inputs are associated to the same cell but also neighborhood cells contain similar documents. This property together with the easy visualization makes the SOM map a useful tool for visualization and clustering of large data sets.

Parameters identified for SOM:

Structural parameters like Hairpin Loop, Internal Loop, Bulge Loop, Multi Loop, External Loop, Energy and inherent sequence parameters like total bases, G/C ratio, and GC content% were considered for this study.

Data Normalization:

Summarized data is normalized linearly such that minimum value in each category is 0 and the maximum 1. This is done to ensure that all the parameters are given equal importance when clustering is done.

Figure 2

Results And Discussion

Figure 3

Figure 1: Distribution of various loops in sequences

In short:

Total no of sequences selected for study = 123
Total number of input parameters = 9
Total iterations per sequence to form a neuron = 10, 0000
Total iterations to form 4 grid (2X2) structure = 12300000
Successful or winning neurons = 4
Unsuccessful neuron = 0

Figure 4

Figure 2: 2X2 grid obtained using SOM

Figure 5

Figure 3: Clusterwise distribution of sequences where Clusters are followed by number .of sequences and percentage of sequences

Cluster (1, 1): This cluster contains 4 sequences and is characterized by moderate values for all the parameters. External loop shows the least variation while maximum variation was observed in bulge loop.

Figure 6

Cluster (1, 2): This cluster comprises of total 69 sequences. Maximum variation was observed in internal loop followed by hairpin, bulge, multi and external loop. Structural energy is high in all the sequences except for An. homunculus and An. cruzii. Different sibling species of An.crucians showed more variation in loops while maintaining the similar values for GC content and no. of bases and similar value for structural energies. Same trend was observed in An. annulipes, An. fluviatilis and An. rangeli sibling species.

Figure 7

Cluster (2, 1): The sequences falling in this cluster show uniformly high energies and similarly high G/C ratio while GC content% for these sequences is found to be quite low contrary to the popular belief of GC content being the most important parameter in determining the Structural energy. Highest variation is observed in internal loop followed by hairpin, multi, bulge and external loop. Anopheles bancrofti genotype D present in this cluster shows a very high value of external loop. Although two sequences belonging to Anopheles lesteri differ in all sequences and structural features, the structural energy is quite the same

Figure 8

Cluster (2, 2): Total 28 sequences fall in this cluster. This cluster is characterized by variation in structural energies which is reflected in the gradient. This cluster has sequences that show differences in base number unlike other cluster. External loop showed lowest variation followed by multi loop. Variation in structural energy for different Anopheles farauti sequences was observed while sequence features did not show variation for these sequences. Variation in structural features possibly has a profound influence in these. Sibling species Anopheles dirus A and An. dirus D showed similar values for all parameters except hairpin loop and multiloop.

Figure 9

Discussion

Molecular taxonomists are generally overwhelmed by complexity of smothering sequence information owing to their number and sibling status of Anopheline species. Dearth of reliable molecular markers has led to an unquenched search to find new ones and utilizing the existing knowledge at different levels to unveil new patterns and phylogenetic inferences. Among them, ITS 2 sequence draws special attention, which is known as a well trusted marker among molecular entomologists but the sheer complexity of phylogeny often hinders and limit its use for reaching to meaningful conclusions and its application in resolving phylogeny across several taxa is debatable (Banerjee et al. 2007). Data mining approaches can be utilized for harnessing of ITS2 secondary structure information of numerous Anopheline ITS2 sequences which remain lying undeciphered in several public domain repositories. Derived secondary structural information is valuable and reliable in this context.

Since its inception by McCulloch and Pitts in 1993, ANN has come a long way and now encompasses a wide range of fields. Application of neural networks within the medical domain for clinical diagnosis, image analysis and interpretation and drug development have been reviewed in past. SOM is a novel approach that belongs to the class of unsupervised neural networks with competitive learning algorithm ability. The SOM approach is useful for extracting implicit, valuable, and interesting data from vast quantities of information. In this approach, neurons compete with each other to earn the right of representing the input data (Oja and Kaski 1999, Kohonen 2001). As a result, data in the multidimensional attribute space can be abstracted to a much smaller number of latent dimensions organized on a basis of a predefined geometry in a space of lower dimensionality, usually a regular two-dimensional array of neurons. Using this approach, the patterns embedded in the input data can be revealed. SOMs cluster the data in a manner similar to cluster analysis, but have an additional benefit of ordering the clusters and enabling the visualization of large numbers of clusters (Bock 2004). This technique is particularly useful for the analysis of large datasets where similarity matching plays a very important role. SOM compresses information while preserving the most important topological and metric relationships of the primary data items (Kirk and Zurada 1999). SOMs have successfully been applied for classification of DNA sequences based on codon usage (Kanaya et al. 2001, Supek and Vlahovicek 2004), nucleotide frequencies (Abe et al. 2003), virtual potentials (Sousa and Sousa 2003), protein sequences analysis (Ferran and Ferrara 1992, Ferran et al. 1994) and clustering of microarray data (Toronen et al. 1999) and in various epidemiological studies (Valkonen et al. 2002, Wang et al. 2002)

In the data set considered, GC content ranges from 44.6 % to 70.8% where Anopheles engarensis showed lowest and Anopheles farauti showed highest GC content. G/C ratio also showed a great deal of variation ranging from 0.76 to 1.45 with Anopheles annulieps and Anopheles vaanedeni showing minimum and maximum G/C ratio respectively. Structural energy, the major factor in stabilizing ITS2 secondary structure varied from –368 Kcal to –51.8 Kcal. A great deal of variation was observed in the number of various loops in secondary structure. Among the loops, highest variation was observed in internal loop followed by multi loop, bulge loop and hairpin loop while exterior loop showed least variation according to number of loops generated. Majority of sequences (47.9% of total sequences) did not show the presence of exterior loops while 39.85% showed only 1 exterior loop and only 1 sequence showed 9 exterior loops. Only 1 sequence was found to lack the internal loop while maximum number of internal loops was obtained in Anopheles epiroticus. Lowest number of bulge loop was observed in An. sinensis while highest number was observed in An. dirus D. Anopheles crucians B showed 13 hairpin loops and only 2 hairpin loops were detected in An. annulipes E. Multiloop could not be detected in 3 sequences and highest number observed was 33.

Clustering and visualization of sequence data using SOM according to inherent features enable efficient interpretation and analysis. The relationship of structural energy with sequence composition features and structural parameters can be explained using this technique. SOM reduces the complexity of multidimensional data hence can be effectively used for finding explicit relationships in such cases.

Concluding Remarks

RNA secondary structure is crucial to three dimensional structure but determination of the correct structure and folding pattern of ITS2 is cumbersome. It is practically unfeasible to calculate the effect of parameters influencing the structural energy of the RNA structure by conventional experimental approaches. With exponential increase in sequences, complexities in deriving interpretation and inferences from the accumulated data will pose an infinite challenge. Data mining approaches can streamline and facilitate in elucidating inherent explicit hidden information in these cases and will empower us in determining not- so- obvious interrelationships. Different RNA folding algorithms also take into account the structural energy as the major determinant in furnishing RNA secondary structure models and conformation. Clustering and visualization of such data will definitely add meaningful dimensions to our understanding of the relationships among the sequence features and structural parameters that come into play in determining the structural energy. This approach can be further fine-tuned in resolving ambiguities using differences at the RNA structural level for identification of sibling species complexes.

Acknowledgement

The authors are grateful to the Director, Indian Institute of Chemical Technology, Hyderabad for his continuous support and encouragement.

Correspondence to

Dr. Upadhyayula Suryanarayana Murty Scientist “F”/ Deputy Director Head, Biology Division, Indian Institute of Chemical Technology, Hyderabad-500007, India. E-mail: murty_usn@yahoo.com Phone: +91 40 27193134; Fax: +91 40 27193227

References

r-0. Aires-de-Sousa, J. and Aires-de-Sousa L. (2003): Representation of DNA sequences with virtual potentials and their processing by (seqrep) Kohonen self-organizing maps. Bioinformatics, 19: 30-36.
r-1. Alvarez I. and Wendel J.F. (2003): Ribosomal ITS sequences and plant phylogenetic inference. Mol. Phylogenet. Evol. 29: 417–434.
r-2. Amit Kumar Banerjee, Neelima Arora and Upadhyayula S.N. Murty. (2007): How Far is ITS2 Reliable as a Phylogenetic Marker for the Mosquito general Electronic Journal of Biology.Vol. 3(3): 61-68.
r-3. Aultman Kathryn S., Michael Gottlieb, Maria Y. Giovanni and Anthony S. Fauci. (2002): Anopheles gambiae Genome: Completing the Malaria Triad. Science 298 (5591):13.
r-4. Barbara R. Jasny, Smith Orla M., Leslie Roberts and Martin Enserink. (2002): Taking a Genomic Bite of the Malaria Mosquito. Science: Vol. 298. no. 5591, pp. 77 – 78.
r-5. Bock, T. (2004): A new approach for exploring multivariate data: self-organizing maps.
r-6. International Journal of Market Research, 46(2):189–203.
r-7. Coleman A.W. and Vacquier V.D. (2002): Exploring the phylogenetic utility of ITS sequences for animals: a test case for abalone (Haliotis). J. Mol. Evol. 54: 246–257.
r-8. Coleman, A.W. (2003): ITS2 is a double-edged tool for eukaryote evolutionary comparisons. Trends in Genetics. 19: 370-375.
r-9. Collins F.H. and Paskewitz S.M.(1996): A review of the use of ribosomal DNA (rDNA) to differentiate among cryptic Anopheles species. Insect Mol. Biol. 5: 1–9.
r-10. Crabtree M.B., Savage H.M. and Miller B.R. (1995): Development of a species-diagnostic polymerase chain reaction assay for the identification of Culex vectors of St. Louis encephalitis virus based on sequence variation in ribosomal DNA spacers. American Journal of Tropical Medicine and Hygiene. 53:105–109.
r-11. Cuadros –Vargas,.Romero E., R and Obermayer K. (2003): Speeding up algorithms of
r-12. SOM Family for Large and High Dimensional Databases. In Yamakawa T., editor, In
r-13. Proceedings of the WSOM, 167–172.
r-14. Fedoroff N.V. (1979): On spacers. Cell. 16: 697– 710.
r-15. Ferran, E. A., Pflugfelder B. and Ferrara P. (1994): Self-organized neural maps of human protein sequences. Protein Sci., 3, 507-521.
r-16. Ferran,E.A. and Ferrara P. (1992): Clustering proteins into families using artificial neural networks. Comput. Appl. Biosci., 8, 39-44.
r-17. Garros C., Harbach R.E. and Manguin S. (2005): Morphological assessment and molecular phylogenetics of the Funestus and Minimus groups of Anopheles (Cellia). J. Med. Entomol. 42:522–536.
r-18. Hackett, B.J., Gimnig J., Guelbeogo W., Costantini C., Koekemoer L.L, Coetzee M ., Collins F.H. and Besansky N.J. (2000): Ribosomal DNA internal transcribed spacer (ITS2) sequences differentiate Anopheles funestus and An. rivulorum, and uncover a cryptic taxon. Insect Mol. Biol. 9: 369–374.
r-19. Hillis, D.M., and Dixon M.T. (1991): Ribosomal DNA: molecular evolution and phylogenetic inference. Q. Rev. Biol. 66 : 411–453.
r-20. Joseph, N., Krauskopf E., Vera M.I. and Michot B. (1999): Ribosomal internal transcribed spacer 2 (ITS2) exhibits a common core of secondary structure in vertebrates and yeast. Nucleic Acids Res. 27:4533–4540.
r-21. Junbai Wang, Jan Delabie, Hans Christian Aasheim , Erlend Smeland and Ola Myklebost. (2002): Clustering of the SOM easily reveals distinct gene expression patterns: results of a reanalysis of lymphoma study. BMC Bioinformatics, 3:36.
r-22. Kirk, J.S. and Zurada J.M. (2000): A two-stage algorithm for improved topography
r-23. Kohonen T., 2001.Self-Organizing Maps. Springer Series in Information Sciences, Vol. 30, Springer, Berlin, Heidelberg, New York. Conference on Systems, Man, and Cybernetics, 4(8-11):2527–2532.
r-24. Kohonen, T. (1982): Self-organized formation of topologically correct feature maps. Biological Cybernetics, 43:59-69.
r-25. Kohonen, T. (1998): Self -Organization of Very Large Document Collections: State of the Art. In Proceeding of the International Conference on Artificial Neural Networks
r-26. (ICANN 1998), Skovde, Sweden, September 2 –4.
r-27. Mai, J.C., and Coleman A.W. (1997): The internal transcribed spacer 2 exhibits a common secondary structure in green algae and flowering plants. J. Mol. Evol. 44: 258–271.
r-28. Manonmani, A., Townson H., Adeniran T., Jambulingam P., Sahu S.and Vijayakumar T. (2000): rDNA-ITS2 polymerase chain reaction assay for the sibling species of Anopheles fluviatilis. Acta Tropica. 78: 3–9.
r-29. Marinucci, M., Romi R., Mancini P., Di Luca M. & Severini C.(1999): Phylogenetic relationships of seven palearctic members of the maculipennis complex inferred from ITS2 sequence analysis. Insect Molecular Biology. 8: 469–480.
r-30. Michot, B., Joseph N., Mazan S. and Bachellerie J.P. (1999): Evolutionarily conserved structural features in the ITS2 of mammalian pre-rRNAs and potential interactions with the snoRNA U8 detected by comparative analysis of new mouse sequences. Nucleic Acids Res. 27: 2271–2282.
r-31. Miller, B.R., Crabtree M.B., Savage H.M.(1996): Phylogeny of fourteen Culex mosquito species, including the Culex pipiens complex, inferred from the internal transcribed spacers of ribosomal DNA. Insect Mol Biol. 5(2):93–107.
r-32. Nurnberger, A., and Detyniecki M. (2002): Visualizing changes in data collections using growing self -organizing maps. In Proceedings of the International Joint Conference on Neural Networks (IJCNN 2002), 2:1912–1917.
r-33. Oja, E. and Kaski S., (editors). (1999): Kohonen Maps (Amsterdam: Elsevier Science).
r-34. Porter, C.H. and Collins F.H. (1991): Species-diagnostic differences in a ribosomal DNA internal transcribed spacer from the sibling species Anopheles freeborni and An. hermsi (Diptera:Culicidae). Am. J. Trop. Med. Hyg. 45: 271–279.
r-35. Shigehiko Kanayaa, Makoto Kinouchia, , Takashi Abea, , Yoshihiro Kudoe, Yuko Yamadae, Tatsuya Nishid, Hirotada Morib, and Toshimichi Ikemura..(2001): Analysis of codon usage diversity of bacterial genes with a self-organizing map (SOM): characterization of horizontally transferred genes with emphasis on the E.coli O157 genome. Gene, 276, 89-99.
r-36. Supek, F., and Vlahovicek K. (2004): Inca: synonymous codon usage analysis and clustering by means of self-organizing map. Bioinformatics, 20, 2329-2330.
r-37. Takashi Abe, Shigehiko Kanaya, Makoto Kinouchi, Yuta Ichiba1, Tokio Kozuki and Toshimichi Ikemura. (2003): Informatics for unveiling hidden genome signatures. Genome Res., 13, 693-702.
r-38. Toronen,P. et al. (1999): Analysis of gene expression data using self-organizing maps. FEBS Lett., 451, 142-146.
r-39. Veli-Pekka Valkonen, Mikko Kolehmainen, Hanna-Maaria Lakka, et al. (2002): Insulin resistance syndrome revisited: application of self-organizing maps. International Journal of Epidemiology; 31:864-871.
r-40. Walton C., Sharpe R.G., Pritchard S.J., Thelwell N.J. and Butlin R.K. (1999): Molecular identification of mosquito species. Biol. J. Linnean Soc., 68:241–56.
r-41. Wesson, D.M., Porter C.H., Collins F.H. (1992): Sequence and secondary structure comparisons of ITS rDNA in mosquitoes (Diptera: Culicidae). Mol. Phylogenet. Evol., 1: 253-269.
r-42. World Malaria Report. (2005). (http://rbm.who.int/wmr2005/html/exsummary_en.htm.)
r-43. Xu, J.N., and Qu F.Y. (1997): Ribosomal DNA difference between species A and D of the Anopheles dirus complex of mosquitoes from China. Med. Vet. Entomol, 11: 134–138.

ISPUB.com

Internet
Scientific
Publications

Clustering and Classification of Anopheline Spacer Sequences using Self Organizing Maps

Keywords

Citation

Abstract

Introduction

Materials And Methods

Data collection and data set preparation

Figure 1

Secondary structure prediction

RNA folds analysis

Structural energy calculation

GC content calculation

Data mining analysis

Self organizing Maps (SOM): Artificial Neural Networks (ANNs) is an abstract simulation of a real nervous system that contains a collection of neuron units communicating with each other via axon connections.

Parameters identified for SOM:

Data Normalization:

Figure 2

Results And Discussion

Figure 3

Figure 4

Figure 5

Figure 6

Figure 7

Figure 8

Figure 9

Discussion

Concluding Remarks

Acknowledgement

Correspondence to

References

Author Information