The Study of Biological Systems through Proteomics: A Quantitative Proteomic Analysis of Non-small Cell Lung Cancer (NSCLC)
J Watkins, E Horlick, D Love
Keywords
education, itraq, lung cancer, mass spectrometry, proteomics
Citation
J Watkins, E Horlick, D Love. The Study of Biological Systems through Proteomics: A Quantitative Proteomic Analysis of Non-small Cell Lung Cancer (NSCLC). The Internet Journal of Genomics and Proteomics. 2013 Volume 6 Number 2.
Abstract
In the post-genomic era proteomics has emerged as one of the most important areas of research, particularly with respect to differential proteomics, the comparative analysis of the normal versus disease states of distinct proteomes. The most well established tool for proteomic analysis is 2-dimensional gel electrophoresis (2-DE). This tool offers a simple method to visually identify changes in protein abundance. 2-DE has been a staple in the examination of proteomes, and has undergone numerous technological advances over the past several decades. Isobaric tags for relative and absolute quantitation (iTRAQ) is a non-gel based alternative to 2-DE as a tool for proteomic analysis. Over the last decade or more it has become a common alternative to 2-DE technology coupled with mass spectrometry. The global use and evolution of proteomics warrants incorporation of the recent advances and uses of the technology into the undergraduate laboratory curriculum to prepare students for the proteomic era. In data presented herein we utilize iTRAQ coupled with mass spectrometry as a novel approach to teach students about the identification, quantification, statistical analysis and road blocks associated with the ever advancing field of comparative proteomics.
Introduction
From the time when the first prokaryotic and eukaryotic genomes were released in 1995 and 1996 respectively [1,2], the field of genomics has exploded into the 21st century providing an enormous amount of information, from the complete sequencing of the human genome to global gene expression in cells, uncovering a vast, fertile landscape for bioinformatics exploration.However, despite the success of these efforts, coupling gene sequence with function remains an ongoing effort.Predictably, however, another discipline, proteomics, has been expanding alongside these technologies.Proteomics describes the methodical identification and quantification of proteins expressed in biological systems.Application of proteomics allows a researcher to obtain information on protein identity, expression level, variants, and post-translational modifications.Researchers attempted to get a head start on this type of analysis twenty years before the first genome sequences were presented [3,4].It all began with the introduction of 2-dimensional electrophoresis (2-DE) and the separation of complex mixtures for comparative expression patterns.This provided much excitement for insightful interactions in human health, agriculture, environment and biotechnology would be uncovered.
2-DE has been the most favored technology for decades, to investigate the global qualitative and quantitative proteomic changes of proteomes [5-9].2-DE gel electrophoresis offers a low cost method of protein analysis. However, it has several notable shortcomings, particularly with respect to variations in repetitive gel runs, ultimately making protein analysis a very labor-intensive and experimentally error-prone task [10]. Although modifications in methodology have helped to alleviate such issues [11] making it a reliable technology for protein profiling.Despite the global use of 2-DE in proteomic research, there is a lack of proteomics education represented in the undergraduate laboratory curriculum to prepare the future workforce.The Rochester Institute of Technology is one of the few colleges that has successfully incorporated its introduction into the science curriculum [12].There are most likely several reasons but cost, reproducibility and implementation within a semester course seem to be the most likely barriers for more widespread acceptance into undergraduate science curricula.
With advances in liquid chromatography (LC), mass spectrometry (MS) and bioinformatics, newer comparative and quantitative studies are possible.LC-MS/MS is a technique coupling high pressure liquid chromatography and tandem mass spectrometry to identify protein mixtures [13, 14].When coupled with chemical tagging, LC-MS/MS allows for quantification of comparative protein mixtures.Two methods of chemically labeling peptides are isotope coded affinity tags (ICAT) and iTRAQ.In ICAT, proteins from two experimental populations are labeled at cysteine residues with light (example, 12C isotopic label) and heavy tags (example, 13C isotopic label) carrying a biotin moiety.Mass spec analysis after separation of the labeled peptides reveals peaks corresponding to the same peptide as doublets in mass spectra due to the mass difference between light and heavy isotopes.Peak intensities of the peptides relate to the relative abundance of the proteins in the control versus the experimental data set.The primary limitations to this chemical labeling method include the sensitivity to cysteine residues – proteins lacking cysteine can be missed entirely and proteins with few cysteines can be misidentified.
A relatively new technology iTRAQ, labels the amino-terminus of peptides after trypsin cleavage at lysine and arginine residues and allows for simultaneous comparison of up to four specimens.With this method, proteins from two related samples can be digested separately with trypsin and the resulting peptides labeled with two amine-specific reagents with distinguishable mass tag reagents. The labeled samples are then combined and analyzed by LC-MS/MS. The mass tags can then be cleaved readily under collision induced dissociation to liberate molecular ions of 114 to 117 Da, leaving a compensatory mass on the peptide. For any one peptide, both labeled forms contribute to the y- and b-ion fragmentation pattern and the relative intensity of the cleaved tags reflects the relative abundance of the two initial peptides. The iTRAQ method increases confidence in identification and quantification by labeling multiple peptides per protein.The iTRAQ labeling technique is unique in that there are four available isobaric tags from which four comparable experiments can be combined into one proteomic analysis.
In this experiment we will outline an undergraduate protocol for the comparative analysis of total protein lysate between human normal lung and lung squamous cell carcinoma tissue utilizing iTRAQ as opposed to 2-DE technology.The aim of this paper is to provide an approach from which to teach students about the identification, quantification, statistical analysis and roadblocks associated with the ever advancing field of comparative proteomics.To limit time related to sample preparation and collection, total cell lysate samples were acquired from the company BioChain Institute, Inc. (Newark, CA) allowing for immediate protein precipitation, proteolysis, iTRAQ labeling, and peptide analysis.
Materials and Methods
Proteolysis and iTRAQTM reagent labeling of total cell and mitochondrial fraction proteins.
Total protein lysate samples for human adult normal lung tissue (male, age 24) and human adult lung tumor tissue (male, age 72) were purchased from BioChain Institute, Inc.100 μg samples of protein from human adult normal lung tissue and human adult lung tumor tissue were precipitated with acetone for 1 h at -20 °C and collected by centrifugation at 13,400 rpm. The pellet of protein from each sample was then resuspended in 30 µl of iTRAQTM dissolution buffer (Applied Biosystems, Grand Island, NY) with 0.05% SDS final concentration.Each 100 μg sample of protein was reduced, alkylated, digested with trypsin and labeled with the isobaric reagents according to the manufacturer’s protocol.
LC/MS/MS analysis
iTRAQ -labeled peptides were separated by OFFGEL based on pI [15] followed by nanoLC-LTQ Orbitrap analysis.
Protein identification and quantification
A triplicate experiment was performed comparing human adult normal lung tissue and human adult lung tumor tissue. Mass spectrometer data were imported into an excel file.Each individual peptide was linked to an independent mass tag ratio.The resulting peptides that identified a single protein were used to calculate a new averaged mass tag ratio for that protein (SD<±0.14).Complete data sets from independent experiments were compared to one another and a final average ratio for each protein identified in the data set was calculated.For each identified protein, the National Center for Biotechnology Information (NCBI, http://www.ncbi.nlm.nih.gov/) was utilized to manually collate information on all proteins with Gene Ontology (GO) annotations.
Statistical analysis
Each protein was characterized by a ratio of mass tag signal intensity reflecting the relative abundance of the protein in normal lung tissue relative to cancerous lung tissue. These abundance ratios were converted to a log10 scale to overcome the lack of symmetry in raw protein abundance ratios and to calculate a mean, μ, and standard deviation, σ, for the log protein abundance.The observed mean of the relative protein abundance ratio was μ = 1 and the log of this value was 0. The standard deviation of the log relative protein abundance was σ = 0.1.In order to determine the biological significance of relative protein expression data, we formulated the null hypothesis H0 that all protein abundance data fit a normal distribution characterized by the population mean = μ.The hypothesis of interest is whether the relative log10 abundance ratio of a protein X = μ, or if the alternative hypothesis is correct, that X ≠ μ.Using a two-tailed t test, we defined criteria to determine whether the abundance of a protein was significantly distinct from the normal distribution characterized by the population mean, μ. We consider a protein to be significantly up or down-regulated if the p-value of its abundance ratio relative to the mean was ≤ 0.05.
Results and Discussion
Proteomic analysis of total cell lysates
Total cell protein was prepared from human adult normal lung tissue (male age 24) and human adult lung tumor tissue (male age 72) for quantitative LC-MS/MS analysis of isobaric tagged peptides as described in Materials and Methods. We collected quantitative data on multiple peptides for 91 proteins.All results obtained are representative of experiments performed in triplicate.In approximately 80% of the identifications, multiple qualifying peptides were observed in each experiment (Table 1).
Table 1 Cont'd
Approximately 33% of the 91 proteins identified were differentially expressed at the p=0.05 confidence level, with 12 up-regulated and 18 down-regulated (Figure 1). This result indicates the cancerous lung entails a re-organization of cell structure and function. The proteins listed in Table 1 were grouped according to gene ontology classifications to reveal a number of categories that showed significant disparities in the numbers of up- and down-regulated proteins, as shown in Table 2. The most consistent patterns of alteration in protein abundance reflect decreases within the tissue for proteins involved in anti-inflammatory/immuno-modulatory response, cell cycle/differentiation, proteolysis, and translation/protein biosynthesis.On the other side there was an increase in proteins of hemoglobin binding, metabolism, and protein inhibition.Although this data shows some alterations in the proteome of the cancerous state versus the normal lung tissue, it is clear that the protein coverage is in sufficient to make any conclusive statements.However this analysis bring up key drawbacks for class discussion on how to expand and improve the protein coverage in GO categories to discover possible biomarkers for disease or central pathways involved in the onset of disease.
Figure 1
Table 2
The 2-DE and iTRAQ proteome anlaysis approached have their pluses and minuses in the anlaysis of proteome changes of cells/tissues.The iTRAQ technology avoids the hurdles that typically affect protein analysis, such as poor solubility of membrane-bound proteins, limited loading capacities (as seen with pH strips), limitations on proteins of 10-120 kilodaltons (kD) with neutral to acidic isoelectric points,difficulty in resolving very basic proteins, and intra-experimental reproducibility.Both the iTRAQ and 2-DE methods of protein analysis are limited with respect to the dynamic range of analysis.As with 2-DE, iTRAQ only identifies highly abundant proteins from complex mixtures, leaving low abundant proteins such as signaling proteins challenging to identify.To overcome this drawback, fractionation is a strategy that may be used to simplify the protein mixture, and allow for detection at the lower levels.The fractionation is carried out prior to LC-MS/MS by liquid chromatography steps, such as strong cation exchange chromatography, creating unique population pools of protein to introduce for mass spectrometry analysis.This additional step provides larger coverage of the proteome especially with proteins in low concentration.Although the dynamic range can be improved by fractionation another contributing factor that decreasesthe ability to acquire low abundant proteins is the presence of highly abundant proteins like human serum albumin (HSA) and immunoglobulin gamma (IgG), which together make up approximately seventy five percent of total serum protein content and inhibit the analysis of low abundance proteins.It can be seen in the data provided herin that a fractionation step may have provided a more in-depth look at proteins involved in GO catagories to solidify key changes affiliated with cancerous lung tissue as seen with the identification of serum albumin with the largest number of peptides at 38.In comparison each of the remaining proteins were identified by a range of 1-17 peptides at the most.To increase identifications in association with fractionation, the use of affinity-based depletion strategies are used to selectively remove HSA and IgG, to facilitate a more comprehensive analysis of proteins from high to low abundance .
This experiment provides a step by step method for the assessment of a comparative sample using iTRAQ technology. This data shows a level of changes taking place and the ability of mass spectrometry to identify and quantitate proteins.The experiment is simplified by utilizing total cell lysates already prepared by a commercial company.This dramatically decreases sample source acquisition, preparation and collection time.By acquiring the comparative samples students can concentrate on the necessary chemistry required to reduce, alkylate, and digest proteins with trypsin, followed by isobaric labeling of the amine terminus of peptides.Discussion on the proper formalities needed for the collection of patient samples, or the growth and maintenance of tissue culture cells can be discussed in detail as a part of laboratory introductions and discussion sections of lab reports.
The cost of iTRAQ analysis can be equivalent to 2-DE provided that access to LCMS/MS equipment is available.However the costs for LCMS/MS are going down and can be negotiated with willing institutions.Data turnaround is relatively quick at about two to five days depending on the load of the proteomic center service.The format of the data returned allows for easy import into excel files to organize GO categories, perform quantitative analysis, statistical analysis, create charts and graphs of representative data.This entire protocol can be performed in just 6 two and a half hour lab periods (without any complication in data acquisition from the proteomics facility) with students working diligently in groups composed of two to four students with tight instructor supervision.A simplified schedule of events can be seen in Table 3.
Acknowledgments
We would like to thank Cristian Ruse and Cexiong Fu of the Cold Spring Harbor Laboratory proteomics shared resource.