Benchmarking BRB-seq, QuantSeq-Pool and NEBNex Ultra II

In the original publication describing the BRB-seq protocol, we compared it with the de facto reference method for bulk transcriptomics: “Illumina TruSeq Stranded mRNA”1. We concluded that, at the same depth, both protocols were almost equivalent in terms of the number of detected genes (Figure 1A), the number of detected differentially expressed genes (Figure 1B), and the correlation between replicates (Figure 1C). 

Figure 1. Comparison of BRB-seq and TruSeq, downsampled at 1M reads. A. Comparison of the number of detected genes, depending on three thresholds; B. Comparison of the number of detected differentially expressed genes. The true positives are called from a gold standard Truseq at 30M depth; C. Correlation between BRB-seq replicates (left) and BRB-seq vs TruSeq samples (right). 

Since then, we are continually enhancing the BRB-seq protocol to lower the detection limit, enhance the purity of the demultiplexing (less cross-contamination), and allow to study more tissues/organisms. In particular, we recently developed a globin depletion kit, for being able to sequence blood samples. 

Given all these new improvements, we aimed at re-evaluating BRB-seq versus the most state-of-the-art technologies. So, we’ve set up a benchmarking strategy to compare BRB-seq versus the most widespread commercial 3’ RNA-seq stranded multiplexed technology from Lexogen: QuantSeq-Pool (LEX). This time, we also aimed at comparing multiplexed methods versus standard non-multiplexed methods, to show that multiplexed methods such as BRB-seq can now reach similar qualities. Therefore, we benchmarked an established non-multiplexed method from the New England Biolab: the NEBNex Ultra II Directional RNA (NEB). Importantly, to compare the three protocols and their potential biological interpretation capacities, we sampled 3 biological replicates under two conditions: Non-Differentiated Adipose Stromal Cells (ND) and Differentiated Adipocytes (D) (see Figure 2). Thanks to this experimental design, we can not only assess the sensitivity of the protocols, but we can also compare the amount of differentially expressed (DE) genes that are found between the two conditions, which is the most prevalent type of analysis in transcriptomics. 

 

Figure 2. Samples used for benchmarking BRB-seq, QuantSeq and NEBNex Ultra II. We used three replicates of ASPCs (Adipose Stromal Population Cells) that we differentiated into mature adipocytes so we could compare the differentiated genes between the two transcriptional states. 

After sequencing these 6 samples (3 reps + 3 reps), using the three technologies, we first compared the multiplexing efficiency of BRB-seq and LEX. We found that demultiplexing was more efficient for BRB-seq (99%) as compared to LEX (92%) (see Figure 3).  

 

 

Figure 3. Demultiplexing results between the two multiplexed strategies (BRB-seq & LEX). This is the total number of reads that we were able to demultiplex (green + yellow) across all samples. 

Next, we compared all three protocols regarding the quality of the sequenced reads, through their mappability to the reference genome (here mus musculus). Quickly, it was clear that BRB-seq results were much closer to NEB than to LEX (see Figure 4). Both LEX and BRB-seq have lower mapping rates than NEB (92% uniquely mapped reads), but LEX is clearly low (61% uniquely mapped reads) compared to BRB-seq (77% uniquely mapped reads).  

 

Figure 4. Results of alignment to the mus musculus mm10 reference genome. This is the details of read mapped to the mouse reference genome. Certain pipelines filter out the multiple mapped reads, i.e. reads that map to different regions of the genome. In this case, the percentage of uniquely mapped reads would be the reads used for the next step, which involves counting gene features. 

Interestingly, while all three methods detect a similar number of mid- and highly-expressed genes at the same depth, LEX seems to detect a higher number of low-expressed genes, while NEB and BRB-seq obtain very similar results. NEB, which should be much more sensitive, doesn’t find these “extra” genes, which is highly suspicious. Taken together with the significantly lower mapping of LEX, this points rather towards spurious mapped reads. 

Figure 5. Number of detected genes, per protocol. This shows the number of detected genes for each of the three protocols. We separated the Non-Differentiated (ND) and Differentiated (D) conditions, to show that there is no major effect of the studied biological signal over this number. We also separated the genes into three categories: Lowly Expressed (left, 0 < cpm < 10), Mid expressed (middle, 10 < cpm < 100), and highly expressed (right, cpm > 100).

On the other hand, read distribution showed a higher number of 3’UTR mapping for BRB-seq and LEX, compared to NEB, which is expected given the fact that BRB-seq and LEX are both 3’-based protocols, as compared to NEB, which is a full-length protocol (Figure 6). 

Figure 6. Read coverage on genes and other genomic loci. A. This panel shows the read coverage along the gene body, from 0% (TSS, 5’) to 100% (TES, 3’), normalized across all genes. It shows that NEB is indeed covering the full transcripts from 5’ to 3’, as a full-length method, while the two other methods mainly cover the 3’ end, as expected. B. This panel shows the read distribution across the different regions of the exome (TES, TSS, 3’UTR, 5’UTR, Exons an Introns). 

Finally, when comparing the two biological conditions, it was very clear that LEX detected less differentially expressed genes than the two other methods, which in turn, were very similar. In particular, at FDR5% threshold, BRB-seq and NEB detected almost twice the number of differentially expressed genes, which is a rather important difference. 

Figure 7. Number of differentially expressed genes. This figure shows the number of differentially expressed genes between the two conditions (ND vs D) uncovered by each protocol. The x-axis shows the results using different FDR cutoffs. In particular, the FDR5% cutoff was drawn using a vertical dotted line, since it’s the default one in most bioinformatics analyses. 

In conclusion, BRB-seq libraries achieved significantly better performances than QuantSeq-Pool, in almost all performance metrics we evaluated: demultiplexing rate (99% vs 92%), mapping rate (77% vs 61%), and number of detected differentially expressed genes (two-fold more). The only exception is the number of detected genes, which was aberrantly higher for QuantSeq-Pool than for NEBNex Ultra II, showing potential issues with spurious read mapping along the exome. 

On the other hand, BRB-seq achieved very similar results to NEBNex Ultra II, in almost all performance metrics. The only main difference comes from the alignment rate, which still seems lower for BRB-seq (77%) than NEBNex Ultra II (92%). This is a well-known characteristic of 3’-based protocols, but 1) the effect is limited for BRB-seq as compared to other protocols, and 2) once aligned the reads have the same quality as the ones of NEBNex Ultra II, since they are able to detect a similar number of genes. Moreover, they are at par concerning differential expression analysis, which is usually the focus of transcriptomics analyses. 

1. Alpern, D., Gardeux, V., Russeil, J. et al. BRB-seq: ultra-affordable high-throughput transcriptomics enabled by bulk RNA barcoding and sequencing. Genome Biol20, 71 (2019). https://doi.org/10.1186/s13059-019-1671-x