Blood transcriptomics: sequence deeper or sequence more samples?

 

Expression Quantitative Trait Loci (eQTLs) in large-scale RNA-seq studies are of great medical interest as they can be employed to understand functional consequences of genetic variants and to personalize and adjust in real-time disease therapy.

However, the current high costs of RNA-seq pose a hard trade-off between the number of patient samples that be sequenced and the sequencing depth obtain for each.

Up until now, the common practice was to prioritize sequencing depth by obtaining between 20 million and 50 million reads for each sample.

However, recent work published by UCLA researchers demonstrates that, given a fixed budget, the power of eQTL discovery power can actually be increased by lowering the sequencing depth per sample and increasing the number of individuals sequenced in the assay.

The researchers, led by prof. Pasaniuc, performed bulk RNA-Seq of whole blood samples (Paxgene tubes) across 1490 individuals at low-coverage (~6 million reads/sample) and show that the effective power – expressed in terms of discovered associations between variants and expressed genes - is higher than an RNA-seq study including a subset of the 1490 individual (570) at high-coverage (~14 million reads/sample).

To try and explain this somewhat counterintuitive findings, the authors proceed to leverage other datasets derived from real RNA-Seq data to explore the interplay of coverage and number individuals in eQTL studies. What they end up showing is that a 10-fold reduction in coverage leads to only a 2.5-fold reduction in statistical power.

What this actually means is that, yes, sequencing deeper and deeper keeps adding more “information content” but it does only to marginal extent.

It means that the diversity across different samples and individuals provide a much richer source of information and new genotype-phenotype associations.

Luckily, new technologies are emerging (ehm, ehm… wink, wink… BRB-seq) that are specifically designed to enable ever larger sample sizes, keep sequencing costs down and, as a result, maximize the usefulness of genomics data.

 

Please do not hesitate to contact us if you would like to know more about blood transcriptomics, large biomarker discovery studies and RNA sequencing.