ArrayShine: An Excel Program for Transforming Gene Expression Data into Color-Coded Molecular Signatures or Fingerprints
H Khan
Keywords
cancer, diagnosis, gene signatures, microarray, prognosis, software
Citation
H Khan. ArrayShine: An Excel Program for Transforming Gene Expression Data into Color-Coded Molecular Signatures or Fingerprints. The Internet Journal of Oncology. 2003 Volume 2 Number 1.
Abstract
The advent of microarray has revolutionized the pace of research in understanding the complex pathophysiology of cancer and developing novel diagnostic and prognostic markers. Microarray gene expression profiling can be used to define molecular 'signatures' or 'fingerprints', which are supposed to be the most powerful tools for cancer management in the near future. The analysis and interpretation of microarray data are tedious, time-consuming and error-prone tasks. Moreover, a continuous upsurge in the gene expression data has hampered the application of microarrays in routine clinical practice. Graphics is a powerful tool to simplify data presentation, and could also be helpful in reducing the prevailing complexities of tabular expression data. The aim of this study was to develop a procedure for transforming numeric expression data into color-coded graphical output for better visualization and ease in comparison. The software,
Introduction
The development of cDNA microarray technology for rapid expression profiling of thousands of genes in a single hybridization step has tempted the researchers to utilize this technique for unfolding the mysteries of genetic diseases. One of the major areas in which microarrays have been extensively utilized is cancer (1). The pattern of expressed genes on a microarray demonstrates a typical profile in relation to cancer type or disease severity. These unique sets of genes defining specific pathophysiology are regarded as ‘molecular signatures' or ‘fingerprints'. Tumors with closely related genetic lesions will have similar signatures and also will be expected to have similar clinical behaviors (2). The information encoded in gene signatures can provide valuable insights in cancer diagnosis and prognosis (3,4,5,6,7).
In the recent years, numerous software tools have been developed to reduce the complexities of microarray data and to extract meaningful interpretation of results (8,9,10,11,12,13). However, only fewer attempts have been made towards exploring the potentials of graphical presentation of expression data (14,15). The commercial software for the same purpose are costly and often beyond the reach of small laboratories of developing countries. Whereas, simple and inexpensive software could help to bring the benefits of microarray-based clinical research to many less-developed centers. By realizing the fact that usage of tabular expression data for routine clinical practice might be more complex and tedious, a computer program,
Methods
Software design
The
Data entry window
The data entry window (Fig. 1) contains columns for data entry, two option buttons for choosing appropriate data type, four option buttons for selecting output type, two input boxes to specify the target range (start and end points) within the expression data, and a command button to which a macro has been assigned to execute the program. There are four columns (Excel worksheet columns) for data entry. Column 1 is for serial number of gene and column 2 for gene name (or accession number). Columns 3 and 4 are designated for gene expression data. Based on the option chosen, the program either uses the data in column 3 alone or computes the ratios of values entered in column 3 (always numerator) and column 4, in order to generate the array sets. Usually single column option has to be used for normalized data, however, if ratios of various gene probes to housekeeping gene (standard) or ratios of sample to control are intended, two-column option should be selected.
Report window
The color-coded gene expression profiles are displayed on a new Excel worksheet. There are four types of array outputs including (i) expression of all the genes selected, in the same order (ii) sorting genes in ascending order of expression, (iii) filtering differentially expressed genes; only filtered genes are displayed in the array and (iv) visual comparison of gene signatures, multiple clusters are displayed for a comparative view. The graphic output of gene expression data is a collection of color-coded squares, spanning horizontally left to right (10 squares in each row) and expanding vertically downwards. The gene expression ratio has been classified into seven categories (different color codes), three for down-regulation (light to dark blue), three for up-regulation (light to dark red) and one for norm-regulation (gray); yellow color is used to identify missing data. The report also shows a table of genes in the array, number of genes in various categories of expression, and the scale of color-coding (Fig. 2). There are two buttons (Next and Reset) on the report window, the former is used to display data entry window and the later for clearing the report window (array signature mode).
Procedure for creating visual arrays
The steps involved in creating a color-coded array of gene expression data are also shown in Fig. 1. Briefly, enter the data in the respective columns of data entry sheet. Large expression data can be conveniently copy-pasted from another source file. Then choose the array type by selecting one of the option buttons, specify the data range by inputting start and end points, and run the program by clicking the ‘OK' button. For each execution, the expression profiling of one sample is processed and the procedure has to be repeated for multiple analysis. It is convenient to keep the original data file open for transferring (copy-paste of entire column is most suitable) data to
Software validation
Three different types of data sets were selected from the published studies (16,17,18) to validate the applications of
Figure 2
Figure 3
In order to validate the creation of multiple array signatures (option 4, for visual comparison), expression data of 30 differentially expressed genes in ovarian cancer samples were used (18). The average signal ratios with control probes were entered and the program was run to create array signature followed by clicking the ‘next' button and running the program again after inputting the signal ratios with tumor probes. The difference in the two sets can be clearly observed from the resulting output (Fig. 4).
Discussion
High-density microarrays possess unmatched supremacy for preliminary screening of differentially expressed genes, which can be used to design a simpler and less expensive diagnostic chip for rapid molecular characterization of cancers (2). Since only selected genes constitute molecular signatures, they can also be analyzed by conventional RT-PCR, without using the sophistication of microarray technology, which is still beyond the reach of many third world laboratories. Bull et al (17) have suggested that a smaller number of potential markers could be assessed more conveniently in biopsy samples using RT-PCR. Recently published data (17,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37) on cancer genetics clearly indicate that only a small fraction of total genes on a microarray show differential expression (Table 1). Based on these findings, the number of genes with differential expression ranged between 3 and 176 (mean, 59 13.19) as compared to total genes on microarrays (range, 425-25000; mean, 4924 1367.33). However, microarray expression profiling could be efficiently utilized to shortlist genes that could be helpful in molecular diagnosis of cancers (17). Thus, designing of macro-arrays or RT-PCR methods might be more realistic approaches in generating useful data for accurately classifying the tumor type and improving therapeutic decisions (4,17,38).
Figure 5
The major steps involved in configuring molecular signatures include the extraction of useful information from microarrays and its subsequent transformation into a format suitable for routine application. The results of our software validation clearly show that
The selection of Microsoft Excel spreadsheet for the development of
Availability of Software
The
Acknowledgments
The author expresses sincere thanks to Dr. John M. Mariadason, Albert Einstein Cancer Center, New York (USA), Dr. J.H. Bull, CMC International, Cheshire (UK), and Dr. Kai Wang, Chiroscience R&D, Inc., Bothell (USA), whose valuable data have been used to validate the functionality of
Correspondence to
Haseeb Ahmad Khan PhD, MRSC (UK) Research Center, Armed Forces Hospital T-835, P.O. Box 7897, Riyadh 11159 Kingdom of Saudi Arabia. E-mail: khan_haseeb@yahoo.com