What does ‘high-throughput’ mean in sequencing?

What does ‘high-throughput’ mean in sequencing? image

High-throughput’ in sequencing refers to the amount of DNA molecules read at the same time. Technologies are now capable of sequencing many fragments of DNA in parallel. This enables scientists to read hundreds of millions of DNA fragments and generate more data, with less time and costs than ever before.

In this article, we discuss the evolution of genome sequencing to understand what ‘high-throughput’ means in sequencing methods today.

The low-throughput era

In 2001 researchers completed the first draft of the human genome (Lander et al., 2001; Venter et al., 2001). It was a watershed moment in the history of biology. However, the coordinated research effort cost an estimated half to one billion US dollars and took over ten years (Schloss et al., 2020).

The problem was throughput.

The Human Genome Project relied on a technology first introduced by Frederick Sanger in 1977 called Sanger sequencing (Sanger, Nicklen and Coulson, 1977). This method revolutionized the reading of DNA. During in vitro DNA replication, DNA polymerase adds radioactively or fluorescently labeled, chain-terminating nucleotides which are read by a sequencing machine.

Sanger sequencing determines the sequences of relatively small DNA fragments less than 900 bases long. Researchers used these fragments to assemble larger DNA fragments and, eventually, entire chromosomes to create their draft of the human genome. This technique is low-throughput because only one stretch of DNA is read at one time.

The high-throughput genomic era

Today, we are firmly in the high-throughput era of sequencing.

Researchers can sequence a human genome for as little as $1000 in only one day. One team recently sequenced a human genome in only five hours and two minutes (Gorzynski et al., 2022). This achievement set the world record for the fastest DNA sequencing technique today.

The solution was throughput.

These advances are largely due to the throughput of today’s sequencing technologies. For example, short-read sequencing commonly uses the ‘sequencing-by-synthesis’ technique. This cuts the DNA into smaller pieces of around 150 bases. It allows massively parallel sequencing of millions of DNA molecules simultaneously. Computational analyses then piece all the sequenced short stretches of DNA back together using a standard human genome as a reference.

Another technology takes a different approach to throughput. Long-read sequencing technologies such as Nanopore now generate individual reads of around 500,000 bases. This generates large data sets in a very short time. One downside is that less DNA can be sequenced simultaneously, thus limiting options for multiplexing.

Sequencing of RNA (RNA-seq) takes a similar approach when researchers are interested in the expression of all genes in the genome. RNA-seq allows for even higher throughput than for genome sequencing. For example, researchers can now multiplex and sequence hundreds or thousands of RNA samples at the same time thanks to novel library preparation techniques.

The ultra-high-throughput genomic era

Novel RNA-seq approaches, such as Bulk RNA Barcoding and Sequencing (BRB-seq) from Alithea Genomics, now allow researchers to multiplex and sequence thousands of samples simultaneously, with little loss in data quality (Alpern et al., 2019). This technique tags the 3’ end of mRNA molecules with a sample barcode used for sample identification after sequencing.

This reduces the depth of sequencing required because only the 3’ end of mRNA is sequenced, whilst still generating highly accurate results. This dramatically increases the throughput of sequencing experiments and paves the way for the ‘ultra-high-throughput’ era of genomics.

References

Alpern, D., Gardeux, V., Russeil, J., Mangeat, B., Meireles-Filho, A.C., Breysse, R., Hacker, D. and Deplancke, B., 2019. BRB-seq: ultra-affordable high-throughput transcriptomics enabled by bulk RNA barcoding and sequencing. Genome biology, 20(1), pp.1-15.

Gorzynski, J.E., Goenka, S.D., Shafin, K., Jensen, T.D., Fisk, D.G., Grove, M.E., Spiteri, E., Pesout, T., Monlong, J., Baid, G. and Bernstein, J.A., 2022. Ultrarapid nanopore genome sequencing in a critical care setting. New England Journal of Medicine, 386(7), pp.700-702.

Lander, E.S., Linton, L.M., Birren, B., Nusbaum, C., Zody, M.C., Baldwin, J., Devon, K., Dewar, K., Doyle, M., FitzHugh, W., et al.; International Human Genome Sequencing Consortium, 2001. Initial sequencing and analysis of the human genome. Nature, 409, pp.860–921.

Sanger, F., Nicklen, S. and Coulson, A.R., 1977. DNA sequencing with chain-terminating inhibitors. Proceedings of the national academy of sciences, 74(12), pp.5463-5467.

Schloss, J.A., Gibbs, R.A., Makhijani, V.B. and Marziali, A., 2020. Cultivating DNA sequencing technology after the human genome project. Annual review of genomics and human genetics, 21, pp.117-138.

Venter, J.C., Adams, M.D., Myers, E.W., Li, P.W., Mural, R.J., Sutton, G.G., Smith, H.O., Yandell, M., Evans, C.A., Holt, R.A., et al. 2001. The sequence of the human genome. Science, 291, pp.1304–1351.