Large-scale RNA-seq series – EP.3: ‘Multi-omics’ improves the classification of 33 different cancer types using 10,000 tumor samples.

In 2018 The Cancer Genome Atlas (TCGA) published groundbreaking findings from the largest ‘multi-omics’ cancer study ever performed (Hoadley et al., 2018).

By using mRNA-seq, alongside four other genome-wide technologies on 10,000 tumor samples from 33 different cancers, the researchers discovered that tumor classification was improved when using molecular signatures compared to traditional clinical methods. Importantly, this molecular approach enabled the detection of tumor groups containing tumors from many different organs, a feature which was previously missed. This novel classification could ultimately impact treatment choices to improve patient survival.

In this article, we discuss the role of mRNA-seq in this discovery and suggest how novel 3’ mRNA-seq technologies could help future studies.


Tumor type is commonly determined by the tissue of origin and histopathology. This leads to broad classifications such as breast or lung cancer.

Previously, the TCGA consortium generated molecular signatures for 12 different cancer types (Hoadley et al., 2014). They showed that at least one in ten cancer patients might be classified differently when using molecular signatures rather than traditional histopathology-based classification. This misclassification may even result in patients receiving suboptimal treatment regimens (Hoadley et al., 2014).

The 2014 analysis from the TCGA included only 12 different cancer types, however in 2018 the authors expanded their analysis to 33 different cancer types, with striking results (Hoadley et al., 2018).

The authors found that in some cases tumors from similar tissues or from sites in close proximity were classified in the same tumor group. This was unsurprising as these likely originate from or contain similar cell types.

Unexpectedly, the study also demonstrated that several tumor groups were comprised of tumors from seemingly unrelated organs.

One tumor group even contained 25 tumor types, with features linked to activation of the patient’s immune response. As immune therapies are increasingly used for multiple diseases, this could pave the way for repurposing these treatments for use in cancer.

This represents just one example of how molecular classification of tumors could improve patient outcomes.

Furthermore, the TCGA represents an unprecedented resource for other researchers. In the four years since the publication of this study in 2018 it has been cited over 1,300 times.

How RNA-seq was used:

The study integrated five different genome-wide technologies to investigate tumor RNA and DNA. These technologies included mRNA-seq using Illumina mRNA TruSeq library preparation, followed by lllumina-based sequencing with the Genome Analyzer or HiSeq 2000 platforms; miRNA-seq; exome sequencing; DNA methylation; and copy number analyses.

When focusing specifically on mRNA-seq for tumor classification, the authors used unsupervised consensus clustering (Wilkerson and Hayes, 2010). This clustering method involved determining similarities in the gene expression of over 15,000 genes per sample, from 10,000 tumors representing 33 different cancer types.

These mRNA expression profiles identified 25 different tumor groups.

Similarly, when all different high-throughput technologies were used in a combined clustering approach, 28 different tumor groups were detected.

In both cases, classical tumor type was the primary feature for many groups, however many other groups contained tumors from different organ types.

This combined approach indicates that transcriptomic analyses are a key aspect of improved tumor classification.

How the large number of samples contributed to the results:

Over 10,000 tumor samples were investigated by mRNA-seq. This unprecedented data set comprised 33 different cancer types, with the largest number of samples coming from over 1,000 breast tumors and around 600 glioblastomas.

The increased statistical power in such a large data set allowed intricate molecular similarities and differences to be determined between diverse tumor types.

Not knowing where a cancer originates is a problem in around one to three percent of cancers. The resolution given by such large sample sizes has allowed molecular criteria to be defined for more accurate future classification of tumors in the clinic, possibly leading to better patient outcomes due to more appropriate treatment choices.

How novel higher-throughput transcriptomics could help in similar studies:

Novel mRNA-seq technologies have dramatically reduced the cost and hands-on time of transcriptomics; key considerations for large consortia such as the TCGA.

One of these technologies is Bulk RNA Barcoding and sequencing (BRB-seq), where the barcoding and early multiplexing of hundreds of samples reduces the expense of the library preparation step thanks to pooling of samples (Alpern et al., 2019).

These low-cost, genome-wide technologies could be commonplace in the future genomic era of medicine for the molecular characterization of tumors.

One BRB-seq sample has around the same cost as profiling four genes by qRT-PCR but provides information on the whole transcriptome; a significant cost saving if used in the clinic.

To find out more about how RNA-seq or BRB-seq could help your study, please contact us at



  • Alpern, D., Gardeux, V., Russeil, J., Mangeat, B., Meireles-Filho, A.C., Breysse, R., Hacker, D. and Deplancke, B., 2019. BRB-seq: ultra-affordable high-throughput transcriptomics enabled by bulk RNA barcoding and sequencing. Genome biology, 20(1), pp.1-15.
  • Hoadley, K.A., Yau, C., Wolf, D.M., Cherniack, A.D., Tamborero, D., Ng, S., Leiserson, M.D., Niu, B., McLellan, M.D., Uzunangelov, V. and Zhang, J., 2014. Multiplatform analysis of 12 cancer types reveals molecular classification within and across tissues of origin. Cell, 158(4), pp.929-944.
  • Hoadley, K.A., Yau, C., Hinoue, T., Wolf, D.M., Lazar, A.J., Drill, E., Shen, R., Taylor, A.M., Cherniack, A.D., Thorsson, V. and Akbani, R., 2018. Cell-of-origin patterns dominate the molecular classification of 10,000 tumors from 33 types of cancer. Cell, 173(2), pp.291-304.
  • Wilkerson, M.D. and Hayes, D.N., 2010. ConsensusClusterPlus: a class discovery tool with confidence assessments and item tracking. Bioinformatics, 26(12), pp.1572-1573.