Development and applications for long read sequencing in the Midlands

Lead Research Organisation: University of Nottingham
Department Name: School of Life Sciences

Abstract

Since the advent of sequencing, driven by Frederick Sanger, knowledge of the underlying genetic code of an organism has been a key feature in establishing the mechanisms by which inheritance influences phenotype. The rapid explosion in next generation sequencing technology in the past decade has taken us from huge consortia trying to determine the sequence of a single human genome to individuals being able to sequence their own genomes for close to $1,000. Alongside the sequencing of the genomic DNA it is also possible to sequence the transcriptome (genes expressed in a cell or tissue) or investigate modifications to the bases of DNA that alter the expression of genes.

Although the UoN has a range of short read sequencing technologies available and has been at the forefront of developing long read sequencing on the MinION platform, there is a clear gap at the level of high throughput long read generation that is addressed by this proposal. Current short read technologies available at UoN allow sequencing microbial communities, exon sequencing of human scale genomes and resequencing of individual human scale genomes. For long read sequencing, Nottingham has led the development of MinION sequencing, and has the capacity to sequence bacterial genomes on single flow cells. DeepSeq, the University of Nottingham sequencing facility, led by this proposal's PI (Dr Matt Loose), contributed to the sequencing of a reference human genome on multiple MinIONs at a number of locations around the world. We were also the first to demonstrate the ability to selectively sequence molecules using the nanopore platform, a technique which we hope others will be able to exploit on a larger scale in the future. DeepSeq has encouraged others to apply MinION sequencing to their projects, providing training both locally and via the 'porecamp' series of workshops nationally. DeepSeq has also contributed to the development of analysis tools such as minoTour for real time analysis of MinION data. We now wish to extend our knowledge and expertise to significantly higher throughput problems solvable with the increased scale of the PromethION platform.

We seek to purchase an Oxford Nanopore PromethION sequencer alongside compute support and significant data storage that will be capable of sequencing up to 48 different samples simultaneously, sequencing single projects at extremely high coverage, or any combination thereof. The flexibility of the PromethION platform is one of its key advantages enabling sequencing at scale or across wide numbers of samples. We, and others, have shown that read length is limited only by the quality of the input DNA. The platform also enables in principle the detection of methylation and modifications to DNA with relatively simple library preparations.

NGS technologies are applicable to a wide range of research questions. The long read NGS technologies outlined here will greatly enhance research at the UoN and, by embedding within the regional Midlands Sequencing Consortium, in the Midlands as a whole. The projects cover a broad range of fields including: Bioscience for Health, Food, Nutrition and Health, Agriculture and Food Security, Stem Cell and Developmental Biology, Data Driven Biology, Exploiting New Ways of Working and Synthetic Biology.

The development of an enhanced long read NGS facility at the UoN embedded within the regional Midlands Sequencing Consortium will enable the UoN and collaborators to address key biological questions in the above areas which will have a significant impact in science, the economy and society.

Technical Summary

Our goal is to provide the University of Nottingham (UoN) with a high throughput long read nanopore sequencing platform supporting both vertebrate and plant scale projects and massively parallel microbial and clinical infection sequencing. We will invest in the PromethION nanopore sequencer, a platform that in principle we have enormous experience in, alongside a significant storage investment to handle the data generation rates of the sequencer. The PromethION represents the latest generation of Nanopore technology, providing 144,000 sequencing channels at full capacity with theoretical maximum throughput in the Tb range per day. This equipment will enable UoN to support the broadest possible range of long read NGS applications, serving all domains of life from bacteria to plants and vertebrates. Further, the platform can be used to capture DNA modifications alongside basic sequence data and enables novel applications including selective sequencing, whereby DNA molecules can be removed from a population such as human DNA from an infection sample.

We will purchase local data storage sufficient to store and process an entire theoretical maximum throughput run of the device, sufficient to handle the yield over the lifetime of the projects proposed here. The compute associated with the PromethION device will be sufficient to enable selective sequencing runs on portions of the device, taking advantage of the flexibility of the platform.

The association with the Midlands Sequencing Consortium will extend these approaches to other Universities. DeepSeq's well established track record in supporting sequencing projects within the UK and internationally will ensure the equipment achieves maximal usage. Compatibility of technologies and approaches we are developing with the wider community ensures that projects and tools can be shared with established sequencing centres such as the Earlham Institute.

Planned Impact

Over the last ten years the scale of sequencing achievable within individual laboratories has shifted from single plasmids to whole genomes. The establishment of long read sequencing at scale is now opening up new possibilities for research that previously available technologies could not address. For example, long read sequencing can be used to resolve structural features in DNA such as long repetitive elements, investigate structural rearrangements and can dramatically reduce the complexity of assembling genomes. Due to the multidisciplinary research areas to which DeepSeq contributes, the main objective for the facility with this new equipment will be to have an impact on research, researchers and industry from the following disciplines:
(i) Combatting antimicrobial resistance: by improving the ability of researchers to identify at species level bacterial in a sample, long read sequencing will support a variety of research projects in this key priority area that is relevant both to human health and food security.
(ii) Bioscience for Health: the research proposed will have a significant impact in a wide range of industrial and academic sectors where understanding the impact of individual mutations and the interplay between.
(iii) Food Security: long read sequencing is essential for characterizing variation occurring within diverse populations of plants and animals.
(iv) Stem Cell and Developmental Biology: the new technology at DeepSeq will enable genome, transcriptome and DNA modifications level understanding of the development of specialized cells from stem cells and broader issues associated with vertebrate development.
(v) Data Driven Biology and Bioinformatics/Exploiting New Ways of Working: the impact of long read sequencing data on systems biology is dramatic. The research proposed here will address methods to extract meaning from the volume of data generated.
(vi) Synthetic Biology: Long read sequencing opens up new possibilities to understand process that cannot be studied with short read sequencing methods that are more widely available and to develop new resources for cell biology.

DeepSeq will also have an impact on:
a. Researchers: Through the acquisition of formalized training in NGS technologies. These will include both researchers within the UoN, members of the wider Midlands Sequencing Consortium and further afield.
b. The University of Nottingham: Having NextSeq/automated robotics will attract new collaborations to Nottingham especially in areas requiring massively parallel sequencing or human genome scale sequencing.
c. The Midlands: Shared resource with compatible sequencing platforms across multiple Midlands Universities including Sheffield, Leicester and Birmingham alongside the M5 grouping.
d. The international relationships between Nottingham University and researchers outside the UK through collaborations between Nottingham researchers and the international research community including international companies.
The wider public will also benefit in the longer term from the research conducted at DeepSeq through the increased ability of different industrial sectors to respond to their customer needs from the environment, to agriculture and health.

The research achievements from DeepSeq and the Midlands Sequencing Consortium will be communicated to a range of audiences via presentations through to discussions and workshops with industry contacts, publications in journals targeting a wide range of audiences and conferences. The research from DeepSeq and its potential will also be communicated to the general public through the annual University of Nottingham public outreach event 'Wonder' and through the 'Nottingham Potential' widening participation activities.
 
Description In the project, we have made a number of advances. Nanopore platforms are developing quickly with longer reads and more rapid sequencing; we remain responsive to these advances, and can leverage to our advantage; in particular, we predict benefits from "Read Until" adaptive sample approaches that will be greater than we originally expected. Specific work has included a complete rebuild of our minoTour software application - the control system for Nanopore machines - which is now in the late phases of testing; the new system will enable rapid tracking of both real-time and base-called data from Oxford Nanopore Technology's (ONT) MinION and GridION instruments and, for now, monitoring of data from PromethION instruments in close to real time for a single flow cell. We are adapting our software to the new Application Programmatic Interface recently released by ONT and are soon to test this on a single chromosome selection on human material, for which the cells providing input DNA are currently growing. We have built and are about to release a "bulk file viewer", which enables the visual inspection of raw signal data for an entire channel in order to see the effects of Read Until on specific channels and check for reads which have been rejected successfully or not from a single pore. Finally, we have detected that reads in ONT's MinKNOW instrument control software are often split out incorrectly, falsely subdividing a DNA molecule into more than one read; our bulk file viewer will allow users to detect and repair under these error scenarios.
Exploitation Route We are constantly sharing and releasing data associated with our experiences in long read sequencing. We also provide regular training camps for the wider community and host visiting scientists to assist in uptake, spread and usage of long read technologies.
Sectors Agriculture, Food and Drink,Education,Environment,Healthcare,Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology

 
Description Advisor to UK BioBank Genomics Policy
Geographic Reach National 
Policy Influence Type Participation in a advisory committee
 
Description A New Durable Read EXtension Method for Very, Very Long Reads
Amount £798,242 (GBP)
Funding ID 212965/Z/18/Z 
Organisation Wellcome Trust 
Sector Charity/Non Profit
Country United Kingdom
Start 01/2019 
End 01/2022
 
Description BBSRC iCASE
Amount £94,431 (GBP)
Organisation Biotechnology and Biological Sciences Research Council (BBSRC) 
Sector Public
Country United Kingdom
Start 10/2017 
End 09/2020
 
Description From Comparative Genomics to Comparative Genetics - What is Required for Life Without DNA Replication Origins?
Amount £495,280 (GBP)
Funding ID BB/R007543/1 
Organisation Biotechnology and Biological Sciences Research Council (BBSRC) 
Sector Public
Country United Kingdom
Start 08/2018 
End 07/2021
 
Description Wellcome Prime Scholarship
Amount £45,000 (GBP)
Organisation Wellcome Trust 
Sector Charity/Non Profit
Country United Kingdom
Start 09/2017 
End 08/2019
 
Title BulkVIS 
Description BulkVIS is a tool for detailed analysis of raw signal data during Nanopore sequencing. This tool enables identification of longer reads than have previously been reported and more detailed understanding of how nanopore sequencing occurs. 
Type Of Material Technology assay or reagent 
Year Produced 2018 
Provided To Others? Yes  
Impact The identification of the longest molecule sequenced to date. https://www.bbc.co.uk/news/science-environment-46046024 
URL https://github.com/LooseLab/bulkvis
 
Title Minotour Client 
Description This is a python tool to upload data into our minoTour application. 
Type Of Material Improvements to research infrastructure 
Year Produced 2019 
Provided To Others? Yes  
Impact This is feeding in to many of our existing projects. 
URL https://github.com/LooseLab/minotourcli
 
Description The Telomere-to-Telomere (T2T) consortium is an open, community-based effort to generate the first complete assembly of a human genome. 
Organisation National Institutes of Health (NIH)
Department National Human Genome Research Institute (NHGRI)
Country United States 
Sector Public 
PI Contribution I have been contributing expertise, time and sequencing data to the activities of of the telomere-to-telomere consortium. The goal of this consortium is to sequence the first human genome from telomere-to-telomere. Our expertise through the Long Read Club has been exploited to enable this goal.
Collaborator Contribution Other partners have generated sequencing data, analysed and assembled reads and presented this work.
Impact No outputs to date.
Start Year 2019
 
Description The Telomere-to-Telomere (T2T) consortium is an open, community-based effort to generate the first complete assembly of a human genome. 
Organisation University of California, Santa Cruz (UCSC)
Country United States 
Sector Academic/University 
PI Contribution I have been contributing expertise, time and sequencing data to the activities of of the telomere-to-telomere consortium. The goal of this consortium is to sequence the first human genome from telomere-to-telomere. Our expertise through the Long Read Club has been exploited to enable this goal.
Collaborator Contribution Other partners have generated sequencing data, analysed and assembled reads and presented this work.
Impact No outputs to date.
Start Year 2019
 
Title minotour v 1 
Description Minotour is a real time set of tools for analysis of nanopore data. 
Type Of Technology Software 
Year Produced 2019 
Open Source License? Yes  
Impact This is being used across a number of our projects. 
URL http://minotour.nottingham.ac.uk
 
Description Grand Challenges in Genomics - Invited Panel Speaker - Joint meeting of the NHGRI/Wellcome Trust, London, Feb 2019 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Policymakers/politicians
Results and Impact Grand Challenges in Genomics was a meeting to discuss the next ten years of Genomics and the ways in which both NHGRI and the Wellcome Trust should target investment and funding in the future.
Year(s) Of Engagement Activity 2019
 
Description Long Read Club 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Long Read Club is an informal grouping of users interested in exploring long read sequencing technologies in all their guises. We are raising awareness of methods, best practice and experience. This is being done through a website, twitter account and youtube channel. Over 900 have signed up to the email list, nearly 700 followers on twitter and over 130 people have subscribed to the youtube channel.
Year(s) Of Engagement Activity 2019
URL http://youtube.com/c/longreadclub
 
Description Singapore Genome Centre - Porecamp Singapore Training Course - Lead Instructor and Keynote - Sept (2018) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Porecamp is an instructional course for using nanopore sequencing in the lab and the field. It is open to all and serves to increase the uptake of nanopore sequencing globally.
Year(s) Of Engagement Activity 2018
 
Description University of British Columbia - Porecamp Training Course - Lead Instructor and Keynote - May (2018) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Porecamp is a training course to encourage uptake of Nanopore sequencing in the field and laboratory.
Year(s) Of Engagement Activity 2018