minoTour: A real time analysis and data management platform for Oxford Nanopore minION reads.

Lead Research Organisation: University of Nottingham
Department Name: School of Life Sciences

Abstract

The last 15 years has seen a dramatic explosion in the impact that DNA sequencing has had on our lives. From the announcement of the first draft human genome sequence in 2000 at an estimated cost of $2.7 billion to the announcement of the $1000 genome in 2014, the rate of growth in sequencing technology has been exponential. The main drive behind the dramatic lowering in cost is the development of massively parallel short read sequencing technologies from companies such as Illumina. These approaches generate short fragments of sequence, typically 75 - 300 base pairs in length, from the material being sequenced which are subsequently combined computationally to reassemble the original sequence. These approaches have generated vast quantities of data and revealed much that we did not know about biology. The establishment of long read sequencing platforms, first developed by PacBio, once again hold the potential for a step change in sequencing technologies. In particular, the portable nanopore based sequencer, minION, developed by Oxford Nanopore Technologies represents a significant new technology. In contrast with previous technologies, sequence reads are generated in almost real-time. As these reads can be very long they can provide information about the material being sequenced almost immediately. Currently no tools exist which truly exploit these real-time features. We are participating in the minION access program and have developed the first real-time analysis platform, as a proof of principle, called minoTour. This suite of tools aims to rapidly identify DNA sequences as they are generated and report the results back to the experimentalist via a simple web-based interface. Key components include sending notifications to an experimentalist in real-time as events take place. For example the detection of a sequence from a specific pathogen or achieving enough sequence coverage to recognise key events from sequence data. The key advantages of a web-based interface to such a platform are ease of use for an experimentalist in a wide variety of environments. Currently the minION platform requires an always on connection to the web for generation of the sequence data. In the future it is anticipated that this technology will be available on the local machine. The tools we are developing will also be able to run without a network connection allowing experimentalists to sequence and analyse data in the field. A broader extension of our approach focusses on de novo assembly of sequence data as it is generated. Early results suggest that de novo assembly from minION data is possible and this would allow for another broad range of applications both in the laboratory and the field. In the future tools such as the minION will be developed as basic sensing devices used in a number of different environments. Having real-time access to the data they generate in a manner similar to that proposed here would allow for the rapid detection of contaminants in a wide range of environments. Similarly it is easy to imagine the lab scientist wishing to sequence a DNA construct having their results within 30 minutes, representing significant savings in time and effort for a large number of projects.

Technical Summary

Amongst the many benefits of nanopore based sequencing approaches is the real-time generation of sequence-data. This can be exploited in many significant ways, for example detection of pathogens, monitoring sequence breadth and depth or even simply sequencing until a specific event is detected. I have developed a suite of tools as proof-of-principle for such approaches, called minoTour, which already demonstrates many of the benefits of real-time analysis. These tools require a database to store key read metrics in (currently mySQL based for speed of development) and a web based front end for data summaries and analysis. We propose developing this tool into a complete set of applications which can be used by individual users of minIONs or by centralised facilities wishing to support multiple minION users. We will also explore the provision of a centralised platform for real-time minION analysis and archival of data in conjunction with a service such as DNANexus. The system would be modular in design allowing key steps within it to be changed depending on the application, for example choosing a specific alignment tool for a given application. A number of groups are in the process of developing tools for the minIONs and long read technology in general. We seek to develop a standard for the archiving of minION data such that it can be readily manipulated by a range of tools as they are developed by the wider community. This standardisation will also allow for the development of compression standards for minION data. minoTour is already being exploited by members of the minION access program and its development is being observed by Oxford Nanopore Technologies. We will use the feedback generated from this in directing our work over the course of the grant.

Planned Impact

Over the last ten years the scale of sequencing achievable within individual laboratories has shifted from single plasmids to whole genomes. The establishment of next generation sequencing is key to answering many of the questions driving biology today. The recent rapid developments in the field of portable nanopore sequencing devices opens a vast range of opportunities. The tools that we develop for this technology are likely to impact research, researchers and industry in a wide range of disciplines. Rapid real time analysis of single molecule sequencing will reduce consumables costs, enable rapid sequence identification and ultimately allow a whole new class of environmental sensing. We anticipate significant rapid benefits in fields such as microbiology and the identification of pathogens, experiments focussed on phasing and requiring long reads, de novo assembly of small genomes and other approaches requiring long reads. We also anticipate developments which will translate into wider industry through our interactions with Oxford Nanopore Technologies themselves and other interested parties. Other sequencing companies are working towards long read sequence technology and are expressing interest in the approaches we are presenting.

The development of minoTour will also have an impact on:
a. Researchers: Through the acquisition of formalized training in sequence analysis and real time data processing. These will include both researchers within the UoN, partners collaborating on this proposal and members of the wider Midlands Sequencing Consortium, of which Deep Seq is a founding member, and further afield.
b. The University of Nottingham: Developing novel sequencing tools at Nottingham will further support the centre in making a unique contribution to sequencing activities in the Midlands and beyond.
c. The Midlands: Deep Seq is a member of the midlands sequencing consortium, which exists to share resources with compatible sequencing platforms across multiple Midlands Universities including Sheffield and Leicester alongside the M5 grouping. The development of tools supporting novel sequencing approaches will be shared amongst this grouping.
d. The international relationships between Nottingham University and researchers outside the UK through collaborations between Nottingham researchers and the international research community including international companies.
The wider public will also benefit in the longer term from the research conducted at DeepSeq through the increased ability of different industrial sectors to respond to their customer needs from the environment, to agriculture and health.
The research achievements from DeepSeq and the Midlands Sequencing Consortium will be communicated to a range of audiences via presentations through to discussions and workshops with industry contacts, publications in journals targeting a wide range of audiences and conferences. The research from DeepSeq and its potential will also be communicated to the general public through the yearly University of Nottingham May Fest and through the 'Nottingham Potential' outreach activities.
 
Title LED display for sequencing. 
Description Nanopore sequencing is often visualised as an array of channels, each of different colours. In this display we develop an interface to show the dynamics of sequencing within an LED matrix, 
Type Of Art Artefact (including digital) 
Year Produced 2016 
Impact This is really developed as an interactive illustration to demonstrate sequencing to undergraduate/school students. 
URL https://github.com/mattloose/512array_Nanolights
 
Description Reading the letters in a molecule of DNA (sequencing) has been greatly accelerated by the production of sequencing machines which read short pieces of DNA and assemble them together. Now, a new generation of sequencers is returning to sequencing long reads. The Oxford Nanopore minION provides long read sequencing in a hand held package. We have developed tools which all the sequencer to choose which molecules of DNA to read. In effect, this method, 'Read Until' allows the sequencer to selectively sequence DNA molecules in real time.
Exploitation Route Read Until can be used in many and diverse areas such as viral sampling or targeted sequencing of regions of genomes. We are already developing a program to use this for sequencing the Zika virus.
Sectors Agriculture, Food and Drink,Digital/Communication/Information Technologies (including Software),Environment,Healthcare

URL http://biorxiv.org/content/early/2016/02/03/038760.full.pdf+html?
 
Description An extension of our software has been commissioned for further development as a stand alone application by DSTL. This work was recently completed (March 2019) and further work has been commissioned.
First Year Of Impact 2017
Sector Aerospace, Defence and Marine
Impact Types Policy & public services

 
Description A New Durable Read EXtension Method for Very, Very Long Reads
Amount £798,242 (GBP)
Funding ID 212965/Z/18/Z 
Organisation Wellcome Trust 
Sector Charity/Non Profit
Country United Kingdom
Start 01/2019 
End 01/2022
 
Description AWS Research Credits
Amount $10,000 (USD)
Organisation Amazon.com 
Sector Private
Country United States
Start 01/2016 
End 01/2017
 
Description BBSRC iCASE
Amount £94,431 (GBP)
Organisation Biotechnology and Biological Sciences Research Council (BBSRC) 
Sector Public
Country United Kingdom
Start 10/2017 
End 09/2020
 
Description Tool to identify pathogens in metagenomic long-read sequence data in real time
Amount £47,290 (GBP)
Organisation Defence Science & Technology Laboratory (DSTL) 
Sector Public
Country United Kingdom
Start 03/2018 
End 03/2019
 
Description Wellcome Prime Scholarship
Amount £45,000 (GBP)
Organisation Wellcome Trust 
Sector Charity/Non Profit
Country United Kingdom
Start 09/2017 
End 08/2019
 
Title Adaptive Sampling Integration into MinKNOW 
Description Our method for applying Adaptive Sampling was co-developed by a PhD student working on our adaptive sampling grant and written into Oxford Nanopores own implementation of Adaptive Sampling that is now shipping in MinKNOW. In essence, this allows a limited subset of functionality from our ReadFish research tools to be used by anyone relatively simply in MinKNO, Oxford Nanopores own GUI for controlling Nanopore sequencing. 
Type Of Material Technology assay or reagent 
Year Produced 2020 
Provided To Others? Yes  
Impact These tools have been used in a number of papers of note to date and have enabled broad uptake of a new sequencing method in the community. 
URL https://github.com/nanoporetech/read_until_api/releases
 
Title BulkVIS 
Description BulkVIS is a tool for detailed analysis of raw signal data during Nanopore sequencing. This tool enables identification of longer reads than have previously been reported and more detailed understanding of how nanopore sequencing occurs. 
Type Of Material Technology assay or reagent 
Year Produced 2018 
Provided To Others? Yes  
Impact The identification of the longest molecule sequenced to date. https://www.bbc.co.uk/news/science-environment-46046024 
URL https://github.com/LooseLab/bulkvis
 
Title DSTL Screening 
Description We have been invited to implement a standalone version of the minoTour tool for use by specific individuals in the real-time identification of pathogens. 
Type Of Material Technology assay or reagent 
Year Produced 2018 
Provided To Others? No  
Impact This is an ongoing project with expected completion in 2019. 
 
Title MinoTour version 1 
Description MinoTour is a complete laboratory information management system for Nanopore sequencing. It also includes customisable real time analysis. 
Type Of Material Improvements to research infrastructure 
Year Produced 2019 
Provided To Others? Yes  
Impact This is a revision of a previously available tool and feeds in to several of our other projects. 
URL https://github.com/looselab/minotourapp
 
Title Minotour Client 
Description This is a python tool to upload data into our minoTour application. 
Type Of Material Improvements to research infrastructure 
Year Produced 2019 
Provided To Others? Yes  
Impact This is feeding in to many of our existing projects. 
URL https://github.com/LooseLab/minotourcli
 
Title Read Until 
Description Read Until is the ability to selectively sequence DNA molecules on a sequencer in real time. We have implemented the first method for doing this using Dynamic Time Warping. 
Type Of Material Technology assay or reagent 
Year Produced 2016 
Provided To Others? Yes  
Impact This tool will be applied to the sequencing of Zika virus in brazil to ensure the entire genome is sequenced at uniform coverage depth. 
URL https://github.com/mattloose/RUscripts
 
Title Read Until API updates 
Description We have overhauled the Oxford Nanopore Read Until API 
Type Of Material Technology assay or reagent 
Year Produced 2020 
Provided To Others? Yes  
Impact This tool will be partially integrated in to Oxford Nanopore Technologies own tools. 
URL https://www.github.com/looselab/read_until_api_v2
 
Title Read Until Scripts 
Description This tool implements various methods for adaptive sequencing using a mix of our own tools and those provided by Oxford Nanopore. 
Type Of Material Technology assay or reagent 
Year Produced 2020 
Provided To Others? Yes  
Impact These tools will be partially integrated into Oxford Nanopores own toolchain. 
URL https://www.github.com/looselab/ru
 
Title minoTour 
Description minoTour is a suite of web based applications for analysing and cataloging minION data. 
Type Of Material Improvements to research infrastructure 
Year Produced 2015 
Provided To Others? Yes  
Impact minoTour is widely used by over 30 universities and institutes worldwide with over 100 unique users. The tools provide real time control of a minION and feed that data back to the user. The tool also enables sequencing to be stopped automatically. 
URL https://github.com/minotour/minotour
 
Description Read Until EBI 
Organisation EMBL European Bioinformatics Institute (EMBL - EBI)
Country United Kingdom 
Sector Academic/University 
PI Contribution We have developed a website and interface for the analysis of minION data (minoTour) - we have also developed the first implementation of read until - selective sequencing on the minION sequencer.
Collaborator Contribution The EBI are world leaders in algorithm and storage development.
Impact Grant Submission to the BBSRC
Start Year 2016
 
Description Real Time Analysis 
Organisation University of Edinburgh
Country United Kingdom 
Sector Academic/University 
PI Contribution We have developed minoTour - a real time suite of software for analysis of minION reads.
Collaborator Contribution Edinburgh have developed poRe - an R based suite for the analysis of minION data.
Impact Grant application to the BBSRC
Start Year 2015
 
Description The Telomere-to-Telomere (T2T) consortium is an open, community-based effort to generate the first complete assembly of a human genome. 
Organisation National Institutes of Health (NIH)
Department National Human Genome Research Institute (NHGRI)
Country United States 
Sector Public 
PI Contribution I have been contributing expertise, time and sequencing data to the activities of of the telomere-to-telomere consortium. The goal of this consortium is to sequence the first human genome from telomere-to-telomere. Our expertise through the Long Read Club has been exploited to enable this goal.
Collaborator Contribution Other partners have generated sequencing data, analysed and assembled reads and presented this work.
Impact No outputs to date.
Start Year 2019
 
Description The Telomere-to-Telomere (T2T) consortium is an open, community-based effort to generate the first complete assembly of a human genome. 
Organisation University of California, Santa Cruz
Country United States 
Sector Academic/University 
PI Contribution I have been contributing expertise, time and sequencing data to the activities of of the telomere-to-telomere consortium. The goal of this consortium is to sequence the first human genome from telomere-to-telomere. Our expertise through the Long Read Club has been exploited to enable this goal.
Collaborator Contribution Other partners have generated sequencing data, analysed and assembled reads and presented this work.
Impact No outputs to date.
Start Year 2019
 
Description Zika 
Organisation University of Birmingham
Department Institute of Microbiology and Infection
Country United Kingdom 
Sector Academic/University 
PI Contribution We are working to develop a protocol for real time sequencing and analysis of the Zika Virus. We are providing real time sequence analysis and manipulation of the sequencer via minoTour and our read until scripts to ensure uniform depth of coverage for the zika virus in real time.
Collaborator Contribution Birmingham are leading a coordinated bid on Zika sequencing.
Impact A grant submission to the MRC
Start Year 2015
 
Title minoTour 
Description minoTour is a real time software analysis suite for the minION and associated nanopore sequencers. It provides read alignment and analysis, remote control of the sequencer itself and remote alerts of changes in sequencer status. 
Type Of Technology Software 
Year Produced 2015 
Open Source License? Yes  
Impact minoTour has been used by more than 30 institutes and over 100 unique users worldwide. We have attracted $10,000 dollars in funding from Amazon to continue its development. 
URL https://github.com/minotour/minotour
 
Title minotour v 1 
Description Minotour is a real time set of tools for analysis of nanopore data. 
Type Of Technology Software 
Year Produced 2019 
Open Source License? Yes  
Impact This is being used across a number of our projects. 
URL http://minotour.nottingham.ac.uk
 
Description Grand Challenges in Genomics - Invited Panel Speaker - Joint meeting of the NHGRI/Wellcome Trust, London, Feb 2019 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Policymakers/politicians
Results and Impact Grand Challenges in Genomics was a meeting to discuss the next ten years of Genomics and the ways in which both NHGRI and the Wellcome Trust should target investment and funding in the future.
Year(s) Of Engagement Activity 2019
 
Description Long Read Club 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Long Read Club is an informal grouping of users interested in exploring long read sequencing technologies in all their guises. We are raising awareness of methods, best practice and experience. This is being done through a website, twitter account and youtube channel. Over 900 have signed up to the email list, nearly 700 followers on twitter and over 130 people have subscribed to the youtube channel.
Year(s) Of Engagement Activity 2019
URL http://youtube.com/c/longreadclub
 
Description PoreCamp 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact PoreCamp is a training initiative established to teach the basics of Nanopore Sequencing to both academic and industrial users of sequencing. It is held approximately every six months and to date has run in Birmingham, Exeter and Australia. Future pore camps are planned in Texas, USA and the East Midlands, UK.
Year(s) Of Engagement Activity 2016,2017
URL https://porecamp.github.io
 
Description PoreCamp Birmingham 2017 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Porecamp is a world recognised Nanopore Training Camp. This week long activity provides comprehensive training and instruction in all aspects of Nanopore sequencing - from library preparation through to sequencing and analysis. I am a founder and lead instructor on this course. In Birmingham we produced a public information film describing our activities and interests in this area.
Year(s) Of Engagement Activity 2017
 
Description PoreCamp Texas 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Porecamp is a world recognised Nanopore Training Camp. This week long activity provides comprehensive training and instruction in all aspects of Nanopore sequencing - from library preparation through to sequencing and analysis. I am a founder and lead instructor on this course.
Year(s) Of Engagement Activity 2017
 
Description Singapore Genome Centre - Porecamp Singapore Training Course - Lead Instructor and Keynote - Sept (2018) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Porecamp is an instructional course for using nanopore sequencing in the lab and the field. It is open to all and serves to increase the uptake of nanopore sequencing globally.
Year(s) Of Engagement Activity 2018
 
Description University of British Columbia - Porecamp Training Course - Lead Instructor and Keynote - May (2018) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Porecamp is a training course to encourage uptake of Nanopore sequencing in the field and laboratory.
Year(s) Of Engagement Activity 2018