Building long-read high-fidelity sequencing resources to support Bioscience Research in the UK

Lead Research Organisation: University of Edinburgh
Department Name: Sch of Biological Sciences

Abstract

The genomics revolution is entering a new phase, with new sequencing technologies that can read long DNA molecules with unprecedented accuracy and low costs. These high fidelity long DNA sequence reads can be used to generate high-quality gapless reference genomes, to fully characterise transcriptomes, to describe methylomes, and to characterize microorganisms in complex microbial mixtures such as in the gut microbiome or in environmental samples of soil or water. The PacBio Revio System is the latest long-read DNA sequencing platform that can generate extremely large amounts of long read data at an affordable cost. A key feature of the Revio is that it can generate highly accurate long reads, by reading a DNA molecule multiple times to correct errors. This grant will acquire the Revio platform, the first in Scotland, embed this within Edinburgh Genomics, and allow us to offer low-cost, high quality long read sequencing in collaboration with researchers in the biological and biotechnology research community. We will use the investment to support researchers who want to generate genomic data for diverse topics such as agriculture, biotechnology, genetics, immunology and synthetic biology. We will also offer advanced genomics data training to support the uptake of the Revio within the wider user community.

Technical Summary

The Revio system represents the evolution of PacBio single molecule sequencing technology, which has much improved single-molecule real-time (SMRT) cell capacity, shorter sequencing times and higher read quality than its predecessor the Sequel IIe. This platform will allow Edinburgh Genomics to generate larger numbers of very high accuracy reads of long DNA and RNA molecules. Long-read technology has benefits over short-read technologies (e.g. Illumina) where assembly or mapping often result in ambiguities or errors. High complexity regions of genomes (DNA) and transcriptomes (RNA) are best studied with longer reads. When assembling genomes at the chromosome level, long reads help to resolve issues like repetitive regions (e.g. centromeres) and large structural variants. The advantages of the Revio are its greatly increased throughput, due to having 25 million Zero Mode Wave guide (ZMW) per SMRT cell (vs 8 million in the Sequel II), and its much improved read quality due to the implementation of deep learning algorithms. The increased throughput and cheaper reagents results in a much lower cost per base (3-4 fold less than the Sequel II). This means that research is more cost effective, allowing greater sequencing depth or a larger number of samples. For example, a single SMRT cell will yield the equivalent of a 30X coverage for a genome of 3 Gbases (e.g. cattle genome) which can produce a full de novo genome assembly. In addition, the high fidelity (or 'HiFi') reads that are generated by PacBio, have an unprecedented level of quality on the Revio (>90% of reads above Q30), thanks to the implementation of deep learning algorithms (i.e. DeepConsensus), quality comparable to the read qualities only known by short-read sequencers. This improved accuracy of the Revio will allow the detection of single nucleotide variants, resolve fully phased genome and transcriptome assemblies and will allow hard to sequence regions to be accurately sequenced at scale.

Publications

10 25 50