Developing methods for long-read marine viral metagenomics

Lead Research Organisation: University of Exeter
Department Name: Biosciences

Abstract

Everybody knows how important terrestrial plants are to the global climate, fixing atmospheric carbon and providing the oxygen that sustains life. What is less known is that marine microbes are responsible for around half of total global primary production. Seawater is teeming with microbes, with around 5 million in every teaspoon in surface water. These can be categorised as the phototrophs that fix atmospheric carbon, the heterotrophs that convert this carbon back to atmospheric CO2 and the vast number of viruses that interact with these two groups. At any given time, around 30% of marine microbes are infected by their associated viruses. Viral lysis kills 20% of bacterial standing stocks every day, releasing a soup of dissolved organic carbon that is the largest pool of available carbon on Earth. Heterotrophs rapidly convert this back into CO2, preventing it from sinking to the deep where it is locked away. Viruses can also act as agents of gene transfer, and can reconfigure host metabolism during infection, changing their metabolic inputs and outputs. Such changes then resonate throughout the community, ultimately shaping its function and composition.

Traditionally, our understanding of viruses was limited to those for which we had successfully cultured the host. As >99% of microbes are uncultivated, this provided a very narrow representation of viral diversity. Recent advances in sequencing viruses directly from the environment, by preparing viral metagenomes (viromes) have revolutionised our understanding of the importance, abundance and impact of viruses on global biogeochemical cycles. However, such approaches rely on successfully assembling viral genomes from short read data, similar to reconstructing an enormous genomic jigsaw with billions of pieces. The high rates of evolution and extraordinary diversity of viruses makes this a challenge. Currently <1% of viral sequences from a metagenome can be assigned to a known viral group.

This project will establish new methods to sequence viral metagenomes using the latest long-read sequencing technology. PacBio and Nanopore sequencers generate reads that are several thousand basepairs long, in some cases, sufficiently long to span an entire viral genome. This increased length makes assembly and analysis of viral genomes trivial. To date, application of long-read sequencing to viral metagenomics has been limited by a need for DNA input requirements that were orders of magnitude higher than that which can be reasonably extracted from an environmental viral sample. However, recent updates to library preparation for Nanopore and advances in DNA amplification for PacBio make it an opportune time to re-evaluate their use in viral metagenomics.

As a result of long-read sequencing, extracting full length viral genomes from environmental samples will provide a step-change in the resolution with which we can perform viral ecology.

Planned Impact

The development of methods to sequence viral communities with significantly greater fidelity offers broad benefits to many applications. Products of this research will have implications for a wide range of stakeholders interested in the use of this tool on fisheries assessments, aquaculture pathogen detection, conservation biology, environmental risk management (e.g. toxic algae blooms, human pathogens, ballast water regulations), with the wider aim of supporting biodiversity and nature's services through NERC's strategic pillar of "Managing environmental change" and the EU's Marine Strategy Framework Directive (MSFD) Good Environmental Status (GES) key Biodiversity Maintenance descriptor 1.

As part of the London Calling Nanopore event, we will
1. Showcase the Nanopore as a tool for screening of aquatic viruses with a live demo of newly developed protocols
2. Engage with water monitoring bodies (e.g. Environment Agency) and the aquaculture industry to develop in-situ monitors for viral and bacterial threats

Under the banner of the SeA-DNA project (NE/N006100/1), which this proposal supports, we will also ensure that developed methods are presented at a proposed stakeholder workshop aiming to:

1) evaluating the needs of the stakeholder community; and
2) determining an individualised roadmap of engagement with each stakeholder group.

Exploiting strong institutional relationships and strategic alliances within the SeA-DNA project, we plan to engage: DEFRA (MSFD implementation); CEFAS (fisheries assessments); IMO Joint Group of Experts on the Scientific Aspects of Marine Environmental Protection - GESAMP (ballast water); Census of Marine Life (Biodiversity); European Centre for Environment and Human Health (human pathogens); British Ecological Society (conservation biology); and the Marine Management Organisation - MMO (marine sustainability and policy).

The project specifically addresses the GES Descriptor: 4 (Elements of food webs ensure long-term abundance and reproduction). Application of viral sequencing tools, including amplification of low concentration DNA and in-situ sequencing with Nanopore, could advance MSFD implementation.
 
Description Until 5 years ago understanding the impact of viruses on global carbon biogeochemistry was limited to viruses we could culture. To culture a virus, one needs to culture the host, and 99% of marine bacteria cannot currently be cultured. Thus, our knowledge of how viruses shape processes was limited to a few select taxa. Development of techniques to capture viruses by metagenomics, removed the requirement for culture and provided a leap forwards in our understanding. However, more recently, it has been revealed that the short-reads assemblies used for reconstructing viral genomes from metagenomes do not accurately capture the viral diversity and, importantly, miss some of the most abundant, and therefore ecologically significant members of the viral community. In this grant, we successfully developed long-read sequencing for viral communities to overcome these issues and more accurately capture the viral diversity. In doing so, we identified highly abundant, novel viral types and ubiquitous, novel mechanisms of host-virus coevolution that better explains the arms race between predator and prey. Furthermore, we have developed the capacity to sequence a single virus genome on a single read, with thousands of such reads being produced for minimal cost. To date, the only way to look at single virus genomes was using cell sorting techniques, followed by inefficient amplification of DNA and sequencing. Use of our new method will enable an unprecedented view of viral population genomics.

Update 2020:
We have now optimised this method and released it as VirION2, decreasing input requirements and increasing sequencing length. The manuscript describing this has been accepted at PeerJ. We have also applied this new method to soils for the first investigation into capturing previously missed viral diversity.
Exploitation Route We are working in collaboration with Matt Sullivan at Ohio State University and the GBMF to develop this technique for investigating the viral dynamics of cryopeg holes. Additionally, we are developing improvements to the single-virus sequencing approach that will be useful to both medical and environmental virologists for better understanding viral population dynamics.

We are now developing methods for capturing full length viral genomes on single reads, with a view to an MRC application for phage therapy.

Update 2020: - We have since applied this method to soils to great effect. We are developing long read viral sequencing as a service within the University of Exeter for external use.
Sectors Agriculture, Food and Drink,Environment,Healthcare

URL https://www.biorxiv.org/content/10.1101/2020.10.28.359364v1.full.pdf
 
Description BIOS-SCOPE
Amount $160,000 (USD)
Organisation Simons Foundation 
Sector Charity/Non Profit
Country United States
Start 10/2018 
End 09/2020
 
Description BIOS-SCOPE II
Amount $420,000 (USD)
Organisation Simons Foundation 
Sector Charity/Non Profit
Country United States
Start 11/2020 
End 10/2023
 
Description NERC Standard Grant
Amount £797,948 (GBP)
Funding ID NE/R010935/1 
Organisation Natural Environment Research Council 
Sector Public
Country United Kingdom
Start 06/2018 
End 06/2022
 
Description ROBUST-SMOLT: Impact of early life history in freshwater recirculation aquaculture systems on salmon robustness and susceptibility to disease at sea.
Amount £259,672 (GBP)
Funding ID BB/S004122/1 
Organisation Biotechnology and Biological Sciences Research Council (BBSRC) 
Sector Public
Country United Kingdom
Start 01/2019 
End 12/2021
 
Description Viruses in soils: key modulators of microbiomes and nutrient cycling?
Amount $3,700,000 (USD)
Organisation U.S. Department of Energy 
Sector Public
Country United States
Start 10/2019 
End 09/2022
 
Title Long read viral metagenomics 
Description The purpose of this grant was to investigate the use of long read sequencing for better capture of viral metagenomes. As part of this grant, we have developed the MinION technology for accurately capturing viral genomes from environmental samples using long reads. This project has not only developed the bioinformatic analyses of long-read data, but has also delivered on its promise to use long reads for improving viral metagenomics. Using our new method, we show that short-read metagenomics currently used misses a significant and important component of the viral fraction. Our method successfully captures this diversity and improves the representation of viral diversity in metagenomes. We have continued to develop this method and have now optimised sequencing from 1 ng of input material and increased amplicon length to >7kbp. 
Type Of Material Technology assay or reagent 
Year Produced 2018 
Provided To Others? Yes  
Impact Genoscope are now planning to use our method for the next round of Tara Oceans viral sequencing. 
URL http://dx.doi.org/10.7717/peerj.6800
 
Title VirION 2 - improved long read viral metagenomics 
Description We have further developed our methods for long read viral metagenomics, decreasing input requirements down to 1 ng and developing a bioinformatic pipeline for robust analysis of the data. 
Type Of Material Technology assay or reagent 
Year Produced 2021 
Provided To Others? Yes  
Impact This method is going to be used to sequence the viromes from the next Tara Oceans cruise. We have had several international requests for developing this as a service in the University of Exeter Sequencing Centre. In addition, Exeter is aiming to become a centre of excellence for wastewater monitoring due to our enhanced capacity for capturing the viral fraction. 
URL https://www.biorxiv.org/content/10.1101/2020.10.28.359364v1.full.pdf
 
Title Viral CarrierSeq 
Description We have developed methods for capturing full length viral genomes on single reads from environmental viral populations. 
Type Of Material Technology assay or reagent 
Year Produced 2019 
Provided To Others? No  
Impact This new methodology enables high throughput single viral genomics at massive scale, as well as capturing DNA modifications within environmental viral populations for the first time 
 
Title First coupled long- and short-read viral metagenomic timeseries dataset 
Description This project has produced the first long-read viral metagenomic dataset to date, providing marine microbial ecologists with a powerful dataset to explore the population structure of viral communities. At present it comprises 13 samples and >1.4 M reads >15kbp in length, including >19,000 full viral genomes on single reads. 2022 update: This dataset now includes 36 short read cellular metagenomes 
Type Of Material Database/Collection of data 
Year Produced 2022 
Provided To Others? No  
Impact This project has produced the first long-read viral metagenomic dataset to date, providing marine microbial ecologists with a powerful dataset to explore the population structure of viral communities. At present it comprises 13 samples and >1.4 M reads >15kbp in length, including >19,000 full viral genomes on single reads. We have shown that long read sequencing captures far more important viral populations than short-read sequencing alone. 
 
Title Optimised assembly pipeline for long-read viral metagenomics 
Description We have developed a robust pipeline for long-read viral metagenomics that can recover full length genomes for viruses at >0.3% of the community, with 99.5% nucleotide accuracy. 
Type Of Material Data analysis technique 
Year Produced 2020 
Provided To Others? No  
Impact This will become the gold standard for long-read viral metagenomics data processing. 
 
Description GeneFlow 
Organisation Gordon and Betty Moore Foundation
Country United States 
Sector Charity/Non Profit 
PI Contribution We are using the long-read viromics methods developed during the NERC funding period to analyse the viral diversity in cryopeg holes to evaluate their role in gene flow in microbial communities as part of the GBMF program.
Collaborator Contribution Working with Matt Sullivan on this project, he has provided mock viral communities for analysis including in-kind contributions of lab tech time for their production. I am going out to Ohio in May to train his lab tech on Nanopore sequencing in order to continue the development of long-read technology for viral metagenomes
Impact The mock viral communities were a cornerstone of our upcoming methods paper on long-read viral metagenomics to be submitted to Nature Microbiology. The Cryopeg collaboration is multi-disciplinary, including geologists, microbiologists, virologists and mathematical modellers
Start Year 2017
 
Description Pies, Pints and PhDs at the National Marine Aquarium 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Public/other audiences
Results and Impact As part of the adults only NMS lates series, Warwick-Dugdale presented to a general audience, explaining what her PhD is about and why it is important for them to know about it, including its wider implications.
Year(s) Of Engagement Activity 2020
URL https://www.visitplymouth.co.uk/whats-on/nma-lates-pints-pies-and-phds-p2808033