Identification and quantification of complex plant pathogens within heterogenous samples harnessing single molecule sequencing

Lead Research Organisation: National Inst of Agricultural Botany
Department Name: Centre for Research

Abstract

This project aims to generate proof of principle data that will allow the development of rapid in-field assays for the identification of specific plant pathogens through the combination of multiple novel DNA sequencing and bioinformatics approaches.
Accurate and rapid diagnosis of plant pathogens remains a key weakness in our defence against aerial and soil-borne diseases. There is often a trade off between speed and specificity, with field based detection systems often limited to genus or species level. This is a problem for many important pathogens systems with host-specific pathovars or formae speciales (ff.spp.) within species complexes, such as Fusarium, Verticillium and Pseudomonas syringae as these are abundant in the environment and have both pathogenic and non-pathogenic lineages that are often phylogenetically indistinguishable using standard 'DNA barcoding' primer sets. They often require multi-locus sequence typing in order to identify their specific plant host, which requires either multiple SNP-specific assays (e.g. Taqman or KASP) or DNA sequencing approaches to identify specific pathovar associated SNPs. There are no field-ready approaches that can capture the complexity of this information required for identification.
Our project aims to combine recent developments in DNA library construction with real time DNA molecule identification in order to provide a specific, quantitative method to identify plant pathogens to the pathovar level, though the method has much broader applicability to other disease settings. This approach will, for the first time, allow the identification of pathovar-level information in real time, generating a probabilistic assignment of identity for the plant pathogen disease causing agent, but also an estimate of the total abundance within a mixed sample, e.g. plant leaf, soil etc. Moreover, as the method that we apply is only partially selective, the composition of the whole sample can be captured (again in a quantitative manner) allowing the estimation of both the absolute and relative abundance of other microbial species biological agents within the sample.
Our approach hinges on the combination of two techniques developed for the single molecule sequencing Oxford Nanopore platform. The first innovation is the use of 'read-until' or 'adaptive' sequencing, which scans the first 150bases of a long read and in real-time queries a database of target sequences for one or more organisms of interest. Only samples with a positive ID are sequenced beyond the initial 150bases sequenced in order to generate more information about the target sample. This means both targeted and untargeted sequencing is taking place within a single sample, allowing both overall abundance of organisms to be estimated, along with specific abundance of the target organisms. The second takes DNA and ligates a unique molecular identified (UMI) to a proportion of molecules in a sample. A few cycles of PCR then generate copies of these UMI-tagged molecules allowing accurate consensus identification, upon sequencing while retaining the crucial information about the relative proportions of molecules in the sample.
While at this stage purely a pilot study, our ultimate ambition is for this study to provide a rapid, low-cost method that can be used to identify pathogens rapidly in complex, real-world situations, where samples are often of suboptimal quality and where time to diagnosis is often critical.

Technical Summary

There is often a trade off between speed and specificity in molecular diagnostics, especially when the disease causing organism is unknown. Nanopore sequencing offers opportunities to overcome this issue through the combination of more sophisticated assignment of longer reads to databases, leveraging the fact that there are often lineage-specific regions of the genome that are either unique or shared between only a few pathogenic isolates. By maximising the information content of the genomes the use of adaptive sequencing methods could offer a 10-fold increase in sensitivity of nanopore sequencing to identify a suspected disease causing agent. The further combination of unique molecular identifiers (UMI) tags will allow faster consensus generation and potentially increase the accuracy of pathogen identification.

In our proposed research we will test all three of these developments singly and in combination using the pathogen Fusarium oxysporum as a model. Using all available high-quality genome sequences we will construct an indexed set of genomes that tracks the variability of pathogen-specific and lineage-specific regions of the genome and simulate the likely power of both adaptive sequencing and UMI tagging.

We will trial adaptive sequencing on a synthetic microbial community spiked with Fusarium of interest. This will allow us to optimise our adaptive sequencing approach. We will use a gold-standard qPCR assay to validate our results. We will then move on to testing the method using soil samples from differing environments in order to understand how DNA extraction methods affect the performance of the method. We will then explore how a combination of UMI tagging (to increase consensus accuracy) affects the adaptive sequencing process.

Taken together any single one of these are methodological advances, but in combination may lead to a step change in the application of nanopore diagnostics in the field.

Publications

10 25 50