Key Research Outputs include: Understanding tumor preanalytical variables to help refine tumor collection protocols optimized for proteomic analyses by minimizing preanalytical variables such as ischemic time, etc. The investigators were not surprised by this discordance, because many regulatory controls lie between RNA and protein expression. Second, most of the focal amplifications increased amounts of certain chromosome segments observed in the earlier genomic analyses of the same tumors did not result in corresponding elevations in protein level.
Proteomic analyses identified a few amplifications that had dramatic effects on protein levels and may represent potentially important targets for diagnosis or therapeutic intervention. Third, proteomics identified five colon cancer subtypes, including classifications that could not be derived from genomic data. Protein expression signatures for one of the subtypes indicated molecular characteristics associated with highly aggressive tumors with poor clinical outcome.
The effort produced a broad overview of the landscape of the proteome all the detectable proteins found in a cell and the phosphoproteome the sites at which proteins are tagged by phosphorylation, a chemical modification that drives communication in the cell across a set of breast cancer tumors that had been genomically characterized in the TCGA project.
And some mutations are found within very large DNA regions that are deleted or present in extra copies, so winnowing the list of candidate genes by studying the activity of their protein products can help identify therapeutic targets. This analysis uncovered new protein markers and signaling pathways for breast cancer subtypes and tumors carrying frequent mutations such as PIK3CA and TP53 mutations. The study also correlated copy number alterations extra or missing DNA in some genes with protein levels, identifying 10 new candidate regulators.
- Cancer Genetics.
- The Cancer Genome Atlas Program.
- Molecular Cytogenetics. Protocols and Applications.
- The Gurus Guide to Transact-SQL;
- Subjectivity and Otherness: A Philosophical Reading of Lacan (Short Circuits)!
Deep proteomic characterization of TCGA ovarian tumors yielded a number of insights, such as how different copy-number alternations influence the proteome, the proteins associated with chromosomal instability, the sets of signaling pathways that diverse genome rearrangements converge on, and the ones most associated with short overall survival.
Specific protein acetylations associated with homologous recombination deficiency suggested a potential means for stratifying patients for therapy. In addition to providing a valuable resource, these findings provided a view of how the somatic genome drives the cancer proteome and associations between protein and PTM levels phosphorylation and clinical outcomes in high-grade serous carcinomas.
SNVs and SVs found in coding regions may impact protein sequence, while those in non-coding regions likely affect gene expression and splicing processes Figure 2 [ 19 ]. Coding and non-coding portions as well as types of variants present within the genome have undergone an attentive nomenclature standardization to allow harmonized scientific communication.
Working groups such as the Human Genome Organization gene nomenclature committee [ 20 ] or the Vertebrate and Genome Annotation projects [ 21 ] provide curation and updates on the nomenclature and symbols of coding and non-coding loci, whereas the standardized reference to properly code genetic variations is curated by the Human Genome Variation Society [ 22 ]. WES allows the screening of all variants including rare in the coding region with a direct relation to protein affecting mutations; WGS allows the identification of all rare coding and non-coding variants [ 19 , 25 ].
The study of the genome relies on the availability of a reference sequence and the knowledge of the distribution of the common variants across the genome. This is important to i map newly generated sequences to a reference sequence and ii refer to population-specific genetic architecture for interpretation of studies such as genome-wide association studies GWAS [ 26 ].
The human genome was sequenced through two independent projects and released in the early s by the public Human Genome Project HGP and a private endeavour led by J. Craig Venter; as a result, the human reference sequence was constructed and over 3 millions SNPs were identified [ 4 , 14 ]. The reference genome is paired with a genome-wide map of common variability, thanks to the International HapMap Project Figure 1 [ 3 ]. Importantly, the HapMap project allowed to complement the HGP with additional information such as that of haplotype blocks, based on the concept of linkage disequilibrium LD, see glossary , the grounding foundation of GWAS [ 15 ].
A typical GWAS design involves using a microarray to genotype a cohort of interest and to identify variants associating with a particular trait in a hypothesis-free discovery study. GWAS identify risk loci, but not necessarily the prime variants or genes responsible for a given association due to LD , nor their function.
Replication and targeted re-sequencing approaches are required to better understand the association found in the discovery phase. Nevertheless, a GWAS suggests potential biological processes BPs associated with a trait to be further investigated in functional work [ 26 ]. These studies have both confirmed previous genetic knowledge e. Although most of the associating SNPs have a small effect size, they provide important clues on disease biology and even may suggest new treatment approaches e.
Another opportunity supported by GWAS is the possibility of comparing the genetic architecture between traits LD score regression [ 37 ]. Conversely, a common criticism is that significant SNPs still do not explain the entire genetic contribution to the trait i. Traditionally, GWAS has been performed through microarrays, and, although NGS methods are becoming increasingly popular due to a reduction in the cost of the technology, the economical impact of WES and WGS is still around 1—2 orders of magnitude more than that of a genome-wide microarray, making the latter still preferable, particularly, for the genotyping of bigger cohorts.
However, a valuable option that is gaining momentum is that of combining the two techniques: NGS is, in fact, extremely helpful together with genotyping data within the same population to increase the resolution of population-specific haplotypes and strength of imputation [ 40 ]. In summary, the choice between a microarray or NGS approach should be based on the scientific or medical question s under consideration, for which pertinent concepts can be found in [ 26 , 41 , 42 ].
Many tools are available for handling genome-wide variant data e.
Cancer Genomics and Proteomics
Plink [ 43 ], Snptest [ 44 ] and a variety of R packages, including the Bioconductor project [ 45 ] supporting the whole workflow from quality control QC of raw genotyping data to analysis, such as association, heritability, genetic risk scoring and burden analyses. NGS data undergo different QC steps with dedicated programs such as the Genome Analysis Toolkit to align the sequences with the reference genome, and to call and filter rare variants [ 46 ].
Of note, a comprehensive repository of all currently available genetic variations including links to the original studies is curated by EBI within the European Variation Archive [ 47 ]. ClinVar or Online Mendelian Inheritance in Man also within NCBI helps in associating coding variants with traits and provides a comprehensive review on links between genetic variability and diseases, respectively.
Biomart within Ensembl allows for filtering and extracting information of interest for a particular gene or SNP. Furthermore, these repositories provide the opportunity to link and display genetic and transcript data together, e. In some cases, data are only available by contacting groups or consortia generating data. We have summarized critical considerations in Table 2 , and all web resources included in this section are shown in Supplementary Table S1a.
Table 2. General critical considerations on applying bioinformatics to genomics. Provided the tailoring of ad hoc techniques and the growth of recent data on coding RNAs mRNAs , these will be the main focus of this section. This information is fundamental for a better understanding of the dynamics of cellular and tissue metabolism, and to appreciate whether and how changes in the transcriptome profiles affect health and disease.
It is now possible to capture almost the totality of the transcriptome through similar strategies used for screening the DNA, i.
Cancer Genomics and Proteomics
As mentioned in the previous section, the RNA-microarray approach is less costly than RNA-sequencing but has significant limitations, as the former is based on previously ascertained knowledge of the genome, while the latter allows broad discovery studies [ 53 ]. RNA-microarrays are robust and optimized for comprehensive coverage through ever updated pre-designed probes; however, transcripts not included in the probe set will not be detected. Of note, although complementary accessories among the microarrays options, such as the tiling array, allow to characterize regions which are contiguous to known ones supporting the discovery of de novo transcripts [ 54 ], RNA-sequencing is more comprehensive, as it enables capturing basically any form of RNA at a much higher coverage [ 55 ].
The workflow to generate raw transcriptome data, through either method, involves the following: i purifying high-quality RNA of interest; ii converting the RNA to complementary DNA cDNA ; iii chemically labelling and hybridizing the cDNA to probes on chip RNA-microarray or fragmenting the cDNA and building a library to sequence by synthesis RNA-sequencing ; iv running the microarray or sequence through the platform of choice; and v performing ad hoc QC [ 55 , 56 ]. The QC steps differ between microarray and sequencing data [ 56 ]: for the former, chips are scanned to quantify signals of probes representing individual transcripts, and reads are subsequently normalized; for the latter, the raw sequences are processed using applications such as FastQC that read raw sequence data and perform a set of quality checks to assess the overall quality of a run.
This step is then followed by alignment with a reference sequence to evaluate coverage and distribution of reads , transcript assembly and normalization of expression levels [ 57 ]. As discussed in the previous section, GWAS hits i. It follows that eQTLs provide an important link between genetic variants and gene expression, and can thus be used to explore and better define the underlying molecular networks associated with a particular trait [ 58 ].
In comparison, trans -eQTLs affect genes located anywhere in the genome and have weaker effect sizes: both features make trans -eQTL analyses currently difficult. During the past decade, the number of studies focusing on eQTL has exponentially grown and eQTL maps in human tissues have been and are being generated through large-scale projects [ 59—62 ]. Studying eQTLs in the right context is particularly important as eQTLs are often only detected under specific physiological conditions and in selected cell types.
In this view, the development of induced pluripotent stem cells models is likely to advance our detection of physiologically and cell type-specific relevant eQTLs that are difficult to obtain form living individuals. In addition, it is important to note that a limitation of eQTL analysis, i. Of note, RNA-sequencing alone provides a framework for unique analyses investigating novel transcript isoforms isoform discovery , ASE and gene fusions analyses [ 56 ].
Another way to study the regulation of gene expression is achieved through the combined analysis of mRNA and microRNA levels.
MicroRNAs are short, non-coding RNA molecules that regulate the actual transcription of mRNA whose profiling is also captured both through array and sequencing techniques. It is therefore clear that not only mRNA levels, but also their regulation by microRNAs are important for a more comprehensive overview on gene expression dynamics [ 64 ]. It is relevant to note that the specific microRNA content of a specimen might, per se , be predictive of a certain condition or trait and can therefore be immediately used in clinical diagnostics.
However, microRNA profiling can be integrated with mRNA expression data to study changes in the transcriptome profile, specifically identifying the mRNA transcripts that undergo regulation, therefore highlighting the potential molecular pathways underpinning a certain trait or condition. One problem here, however, is the need to identify the mRNA molecules regulated by each given microRNA sequence for accurate visualization of gene regulatory networks [ 65 ].
A more system-wide approach to assess gene expression is gained through gene co-expression analyses, including weighted gene co-expression network analysis WGCNA [ 71 ]. There is a plethora of solutions for data storage, sharing and analysis. Groups that generate data store it either on private servers or public repositories. Thus, the end user who downloads data needs to possess, or develop, a pipeline for analysis: Bioconductor is again a valuable resource for this.
Other sites provide a framework for analysing data in an interactive and multi-layered fashion, such as NCBI, Ensembl and UCSC, or the Human Brain Atlas that allows verifying brain-specific expression patterns of genes of interest at different stages of life.
- TIER 2 ARTICLE TYPES.
- Declutter Your Life: Reduce Stress, Increase Productivity, and Enjoy Your Clutter-Free Life.
- Data Analysis and Visualization in Genomics and Proteomics.
- No Results Page | Barnes & Noble®?
- Household and intrahousehold impact of the Grameen Bank and similar targeted credit programs in Bangladesh.
- Cancer genomics and proteomics. Methods and Protocols. Preface..
The Genotype-Tissue Expression portal is a catalogue of human gene expression, eQTL, sQTL splicing quantitative trait loci and ASE data that can be used interactively to verify gene expression and gene expression regulation patterns in a variety of different tissues [ 59 ], while Braineac is a similar resource tailored for similar studies in human brain [ 61 ]. We have summarized critical considerations in Table 3 , and all web resources included in this section are shown in Supplementary Table S1b.
Table 3. General critical considerations on applying bioinformatics to transcriptomics. Be aware of the possibility of contamination from different cell types in data originating from homogenates.
The proteome is the entire set of proteins in a given cell, tissue or biological sample, at a precise developmental or cellular phase. Proteinomics is the study of the proteome through a combination of approaches such as proteomics, structural proteomics and protein-protein interactions analysis. One important consideration, when moving from studying the genome and the transcriptome to the proteome, is the huge increase in potential complexity. The 4-nucleotide codes of DNA and mRNA are translated into a much more complex code of 20 amino acids, with primary sequence polypeptides of varying lengths folded into one of a startlingly large number of possible conformations and chemical modifications e.
Also, multiple isoforms of the same protein can be derived from alternative splicing Figure 4. Summary of protein structural features and methods to generate and analyse proteomics data. These degrees of freedom in characterizing proteins contribute to the heterogeneity of the proteome in time and space, making the omics approach extremely challenging. In addition, techniques for protein studies are less scalable than those to study nucleic acids.
Library Hub Discover
Researchers are encouraged to deposit data of proteomic experiments such as raw data, protein lists and associated metadata into public databases, e. As previously noted, the proteome is extremely dynamic and depends on the type of sample as well as conditions at sampling.
Even when omics techniques, such as cell-wide mass spectrometry MS , are applied, elevated sample heterogeneity complicates the comparison of different studies e.