Adaptive sampling ('Read Until') methods in optimised nanopore sequencing technologies

Lead Research Organisation: University of Nottingham
Department Name: School of Life Sciences

Abstract

Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.

Technical Summary

We propose to develop algorithms to enable adaptive sampling of DNA in real time by exploiting the unique property of nanopore sequencers, that data are streamed from nanopores and that the Oxford Nanopore Technology minION device allows the specific molecules to be ejected from a nanopore at any time, regardless of how completely it has been read. For this, two linked, but distinct, problems must be solved: The DNA molecule (represented by changes in current flow) must be mapped rapidly to a reference and an accept/reject decision must be made based on accumulated previous mapping events. We will address both of these problems using five model cases of direct relevance to BBSRC science:
1. Rapid even coverage in bacterial genome sequencing (e.g. pathogen identification in food-borne disease)
2. Even coverage in diploid genome resequencing (e.g. marker and variant discovery in livestock welfare and breeding)
3. Sequencing of genomic regions of interest that are recalcitrant to conventional sequencing (e.g. in crop plant genomics)
4. Maximising discovery and quantification of low-abundance transcripts (e.g. in fish pathogen response transcriptomics)
5. Coordination of multi-sample sequencing in complex mixtures (e.g. in comparative metagenomics studies)
To achieve rapid matching of early read data to reference sequence we will explore several indexing/pre-computing strategies, including Fast Fourier Transform of streamed data; wavelet transform of the stream followed by indexing; discretisation of the signal and suffix tree or FM-index processing. This tool would run on the laptop local to the sequencer. In contrast, the logical process for accepting or rejecting specific reads will be managed by an external server system running appropriate pipelines on the minoTour minION analysis platform. Templates will be generated for minoTour allowing experienced users to generate pipelines for further specific use cases.

Planned Impact

The application of sequencing technologies underpins much of biological research today. Our approach, adaptive sampling in nanopore-based sequencing, serves to eliminate coverage bias and focus resolving power and thus has numerous beneficiaries. Within the broad UK and global academic and applied science communities these methods will benefit both those already using, and those yet to use, sequencing methods.

The direct impacts of our work will be delivered as an enabling software technology that allows broad use of adaptive sampling. During the project we will specifically demonstrate the technology in five areas of biological research and application, each of which represents a challenge area for current sequencing approaches. These are the rapid sequencing of bacterial pathogens for identification, typing and resistance profiling purposes (demonstrating coverage control in diploid genome sequencing), marker and variant discovery in livestock resequencing (even coverage in diploid genome sequencing), access to regions that are difficult to sequence in higher plants, particularly the crop species (targeted genomic region sequencing), pathogen response transcriptome characterisation and profiling in farmed fish species (low-abundance transcript sequencing) and comparative metagenomics (coverage/focus control in multi-sample sequencing). We expect direct impact on groups of researchers who use sequencing approaches in these areas, including, but not limited to, those who have expressed support for the project (see letters of support).

Through the capacity to eliminate coverage bias, sequencing costs will be reduced, making sequencing available to areas of research and application for which cost remains prohibitive (such as deep population biology of crops, the discovery of low frequency variant alleles for livestock breeding programmes and the profiling of expression in non-model species). Through the ability to focus on defined regions, adaptive sampling will bring powerful methods to areas such as ecology and biodiversity (barcoding, whole-ecosystem analysis, occurrences and abundance), environmental sensing (water safety, environmental health, sentinel markers for pollution and climate change), food chain control (food species/breed/line validation, forensic tracking), border and trade control (invasive species, illegal trade in controlled species), bioenergy (investigation of new species, yield improvement), public health (environmental and zoonotic pathogen sinks, epidemiology of anti-microbial drug resistance) and animal health (surveillance, outbreak detection, transmission control).

The UK has long been established at the forefront of sequencing technology and the application of adaptive sampling methods to nanopore technologies will serve to continue this trend.
 
Title LED display for sequencing. 
Description Nanopore sequencing is often visualised as an array of channels, each of different colours. In this display we develop an interface to show the dynamics of sequencing within an LED matrix, 
Type Of Art Artefact (including digital) 
Year Produced 2016 
Impact This is really developed as an interactive illustration to demonstrate sequencing to undergraduate/school students. 
URL https://github.com/mattloose/512array_Nanolights
 
Description In the project, we have made a number of advances. Nanopore platforms are developing quickly with longer reads and more rapid sequencing; we remain responsive to these advances, and can leverage to our advantage; in particular, we predict benefits from "Read Until" adaptive sample approaches that will be greater than we originally expected. Specific work has included a complete rebuild of our minoTour software application - the control system for Nanopore machines - which is now in the late phases of testing; the new system will enable rapid tracking of both real-time and base-called data from Oxford Nanopore Technology's (ONT) MinION and GridION instruments and, for now, monitoring of data from PromethION instruments in close to real time for a single flow cell. We are adapting our software to the new Application Programmatic Interface recently released by ONT and are soon to test this on a single chromosome selection on human material, for which the cells providing input DNA are currently growing. We have built and are about to release a "bulk file viewer", which enables the visual inspection of raw signal data for an entire channel in order to see the effects of Read Until on specific channels and check for reads which have been rejected successfully or not from a single pore. Finally, we have detected that reads in ONT's MinKNOW instrument control software are often split out incorrectly, falsely subdividing a DNA molecule into more than one read; our bulk file viewer will allow users to detect and repair under these error scenarios.
Exploitation Route We expect impacts of value to the UK and international bioscience community, through the delivery of software components that enable and empower those using nanopore sequencing. To date, we have built a number of software components, such as the re-written minoTour and the rebuilt file viewer, that will soon reach the public domain. These advances, along with performance improvements in the platform itself, will allow us to advance our impacts to specific communities through our five challenge "exemplars" addressing specific applications and communities, the first of which is currently being initiated. Through our communications with the user community, we have also identified new areas of challenge, such as field sequencing of viral samples for rapid identification in compute-limited contexts.
Sectors Aerospace, Defence and Marine,Agriculture, Food and Drink,Education,Healthcare,Manufacturing, including Industrial Biotechology

URL https://github.com/looselab/readfish
 
Description The software that we developed has been integrated into Oxford Nanopore Technologies platforms, including releases and improvements to APIs, methods and processes. In addition there are now numerous papers, grants and new applications being delivered around our existing tool chain.
First Year Of Impact 2020
Sector Healthcare,Manufacturing, including Industrial Biotechology
Impact Types Economic

 
Description A New Durable Read EXtension Method for Very, Very Long Reads
Amount £798,242 (GBP)
Funding ID 212965/Z/18/Z 
Organisation Wellcome Trust 
Sector Charity/Non Profit
Country United Kingdom
Start 01/2019 
End 01/2022
 
Description BBSRC iCASE
Amount £94,431 (GBP)
Organisation Biotechnology and Biological Sciences Research Council (BBSRC) 
Sector Public
Country United Kingdom
Start 10/2017 
End 09/2020
 
Description From Comparative Genomics to Comparative Genetics - What is Required for Life Without DNA Replication Origins?
Amount £495,280 (GBP)
Funding ID BB/R007543/1 
Organisation Biotechnology and Biological Sciences Research Council (BBSRC) 
Sector Public
Country United Kingdom
Start 08/2018 
End 07/2021
 
Description Tool to identify pathogens in metagenomic long-read sequence data in real time
Amount £47,290 (GBP)
Organisation Defence Science & Technology Laboratory (DSTL) 
Sector Public
Country United Kingdom
Start 03/2018 
End 03/2019
 
Description Wellcome Prime Scholarship
Amount £45,000 (GBP)
Organisation Wellcome Trust 
Sector Charity/Non Profit
Country United Kingdom
Start 09/2017 
End 08/2019
 
Title Adaptive Sampling Integration into MinKNOW 
Description Our method for applying Adaptive Sampling was co-developed by a PhD student working on our adaptive sampling grant and written into Oxford Nanopores own implementation of Adaptive Sampling that is now shipping in MinKNOW. In essence, this allows a limited subset of functionality from our ReadFish research tools to be used by anyone relatively simply in MinKNO, Oxford Nanopores own GUI for controlling Nanopore sequencing. 
Type Of Material Technology assay or reagent 
Year Produced 2020 
Provided To Others? Yes  
Impact These tools have been used in a number of papers of note to date and have enabled broad uptake of a new sequencing method in the community. 
URL https://github.com/nanoporetech/read_until_api/releases
 
Title BulkVIS 
Description BulkVIS is a tool for detailed analysis of raw signal data during Nanopore sequencing. This tool enables identification of longer reads than have previously been reported and more detailed understanding of how nanopore sequencing occurs. 
Type Of Material Technology assay or reagent 
Year Produced 2018 
Provided To Others? Yes  
Impact The identification of the longest molecule sequenced to date. https://www.bbc.co.uk/news/science-environment-46046024 
URL https://github.com/LooseLab/bulkvis
 
Title DSTL Screening 
Description We have been invited to implement a standalone version of the minoTour tool for use by specific individuals in the real-time identification of pathogens. 
Type Of Material Technology assay or reagent 
Year Produced 2018 
Provided To Others? No  
Impact This is an ongoing project with expected completion in 2019. 
 
Title MinoTour version 1 
Description MinoTour is a complete laboratory information management system for Nanopore sequencing. It also includes customisable real time analysis. 
Type Of Material Improvements to research infrastructure 
Year Produced 2019 
Provided To Others? Yes  
Impact This is a revision of a previously available tool and feeds in to several of our other projects. 
URL https://github.com/looselab/minotourapp
 
Title Minotour Client 
Description This is a python tool to upload data into our minoTour application. 
Type Of Material Improvements to research infrastructure 
Year Produced 2019 
Provided To Others? Yes  
Impact This is feeding in to many of our existing projects. 
URL https://github.com/LooseLab/minotourcli
 
Title Read Until API updates 
Description We have overhauled the Oxford Nanopore Read Until API 
Type Of Material Technology assay or reagent 
Year Produced 2020 
Provided To Others? Yes  
Impact This tool will be partially integrated in to Oxford Nanopore Technologies own tools. 
URL https://www.github.com/looselab/read_until_api_v2
 
Title Read Until Scripts 
Description This tool implements various methods for adaptive sequencing using a mix of our own tools and those provided by Oxford Nanopore. 
Type Of Material Technology assay or reagent 
Year Produced 2020 
Provided To Others? Yes  
Impact These tools will be partially integrated into Oxford Nanopores own toolchain. 
URL https://www.github.com/looselab/ru
 
Title SwordFish Adaptive Sampling 
Description This tool enables adaptive sampling from our Nanopore monitoring tool MinoTour - the tool enables genuine adaptive sampling in a range of contexts including adaptive sampling for SCoV2 and human genomes. 
Type Of Material Technology assay or reagent 
Year Produced 2021 
Provided To Others? Yes  
Impact This tool is new, but is likely ot have significant impact on the wider application of adaptive sampling. 
URL https://github.com/LooseLab/swordfish/
 
Description Read Until EBI 
Organisation EMBL European Bioinformatics Institute (EMBL - EBI)
Country United Kingdom 
Sector Academic/University 
PI Contribution We have developed a website and interface for the analysis of minION data (minoTour) - we have also developed the first implementation of read until - selective sequencing on the minION sequencer.
Collaborator Contribution The EBI are world leaders in algorithm and storage development.
Impact Grant Submission to the BBSRC
Start Year 2016
 
Description The Telomere-to-Telomere (T2T) consortium is an open, community-based effort to generate the first complete assembly of a human genome. 
Organisation National Institutes of Health (NIH)
Department National Human Genome Research Institute (NHGRI)
Country United States 
Sector Public 
PI Contribution I have been contributing expertise, time and sequencing data to the activities of of the telomere-to-telomere consortium. The goal of this consortium is to sequence the first human genome from telomere-to-telomere. Our expertise through the Long Read Club has been exploited to enable this goal.
Collaborator Contribution Other partners have generated sequencing data, analysed and assembled reads and presented this work.
Impact No outputs to date.
Start Year 2019
 
Description The Telomere-to-Telomere (T2T) consortium is an open, community-based effort to generate the first complete assembly of a human genome. 
Organisation University of California, Santa Cruz
Country United States 
Sector Academic/University 
PI Contribution I have been contributing expertise, time and sequencing data to the activities of of the telomere-to-telomere consortium. The goal of this consortium is to sequence the first human genome from telomere-to-telomere. Our expertise through the Long Read Club has been exploited to enable this goal.
Collaborator Contribution Other partners have generated sequencing data, analysed and assembled reads and presented this work.
Impact No outputs to date.
Start Year 2019
 
Title ReadFish Adaptive Sampling Toolkit 
Description This suite of tools interacts with Nanopores sequencers to enable adaptive sampling of molecules in real-time via direct base calling. 
Type Of Technology Software 
Year Produced 2020 
Open Source License? Yes  
Impact These tools have been used by many groups to investigate rare disease amongst other approaches. 
URL https://github.com/nanoporetech/read_until_api/releases
 
Title minotour v 1 
Description Minotour is a real time set of tools for analysis of nanopore data. 
Type Of Technology Software 
Year Produced 2019 
Open Source License? Yes  
Impact This is being used across a number of our projects. 
URL http://minotour.nottingham.ac.uk
 
Description Grand Challenges in Genomics - Invited Panel Speaker - Joint meeting of the NHGRI/Wellcome Trust, London, Feb 2019 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Policymakers/politicians
Results and Impact Grand Challenges in Genomics was a meeting to discuss the next ten years of Genomics and the ways in which both NHGRI and the Wellcome Trust should target investment and funding in the future.
Year(s) Of Engagement Activity 2019
 
Description Long Read Club 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Long Read Club is an informal grouping of users interested in exploring long read sequencing technologies in all their guises. We are raising awareness of methods, best practice and experience. This is being done through a website, twitter account and youtube channel. Over 900 have signed up to the email list, nearly 700 followers on twitter and over 130 people have subscribed to the youtube channel.
Year(s) Of Engagement Activity 2019
URL http://youtube.com/c/longreadclub
 
Description Oxford Nanopore - Basecallng Consensus Hackathon - Invited Contributor - July (2018) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact An invitation only hackathon to investigate questions around base calling and sequence consensus.
Year(s) Of Engagement Activity 2018
 
Description PoreCamp 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact PoreCamp is a training initiative established to teach the basics of Nanopore Sequencing to both academic and industrial users of sequencing. It is held approximately every six months and to date has run in Birmingham, Exeter and Australia. Future pore camps are planned in Texas, USA and the East Midlands, UK.
Year(s) Of Engagement Activity 2016,2017
URL https://porecamp.github.io
 
Description PoreCamp Birmingham 2017 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Porecamp is a world recognised Nanopore Training Camp. This week long activity provides comprehensive training and instruction in all aspects of Nanopore sequencing - from library preparation through to sequencing and analysis. I am a founder and lead instructor on this course. In Birmingham we produced a public information film describing our activities and interests in this area.
Year(s) Of Engagement Activity 2017
 
Description PoreCamp Texas 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Porecamp is a world recognised Nanopore Training Camp. This week long activity provides comprehensive training and instruction in all aspects of Nanopore sequencing - from library preparation through to sequencing and analysis. I am a founder and lead instructor on this course.
Year(s) Of Engagement Activity 2017
 
Description Singapore Genome Centre - Porecamp Singapore Training Course - Lead Instructor and Keynote - Sept (2018) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Porecamp is an instructional course for using nanopore sequencing in the lab and the field. It is open to all and serves to increase the uptake of nanopore sequencing globally.
Year(s) Of Engagement Activity 2018
 
Description University of British Columbia - Porecamp Training Course - Lead Instructor and Keynote - May (2018) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Porecamp is a training course to encourage uptake of Nanopore sequencing in the field and laboratory.
Year(s) Of Engagement Activity 2018