Identification and quantification of complex plant pathogens within heterogenous samples harnessing single molecule sequencing

Lead Research Organisation: National Institute of Agricultural Botany
Department Name: Centre for Research


This project aims to generate proof of principle data that will allow the development of rapid in-field assays for the identification of specific plant pathogens through the combination of multiple novel DNA sequencing and bioinformatics approaches.
Accurate and rapid diagnosis of plant pathogens remains a key weakness in our defence against aerial and soil-borne diseases. There is often a trade off between speed and specificity, with field based detection systems often limited to genus or species level. This is a problem for many important pathogens systems with host-specific pathovars or formae speciales (ff.spp.) within species complexes, such as Fusarium, Verticillium and Pseudomonas syringae as these are abundant in the environment and have both pathogenic and non-pathogenic lineages that are often phylogenetically indistinguishable using standard 'DNA barcoding' primer sets. They often require multi-locus sequence typing in order to identify their specific plant host, which requires either multiple SNP-specific assays (e.g. Taqman or KASP) or DNA sequencing approaches to identify specific pathovar associated SNPs. There are no field-ready approaches that can capture the complexity of this information required for identification.
Our project aims to combine recent developments in DNA library construction with real time DNA molecule identification in order to provide a specific, quantitative method to identify plant pathogens to the pathovar level, though the method has much broader applicability to other disease settings. This approach will, for the first time, allow the identification of pathovar-level information in real time, generating a probabilistic assignment of identity for the plant pathogen disease causing agent, but also an estimate of the total abundance within a mixed sample, e.g. plant leaf, soil etc. Moreover, as the method that we apply is only partially selective, the composition of the whole sample can be captured (again in a quantitative manner) allowing the estimation of both the absolute and relative abundance of other microbial species biological agents within the sample.
Our approach hinges on the combination of two techniques developed for the single molecule sequencing Oxford Nanopore platform. The first innovation is the use of 'read-until' or 'adaptive' sequencing, which scans the first 150bases of a long read and in real-time queries a database of target sequences for one or more organisms of interest. Only samples with a positive ID are sequenced beyond the initial 150bases sequenced in order to generate more information about the target sample. This means both targeted and untargeted sequencing is taking place within a single sample, allowing both overall abundance of organisms to be estimated, along with specific abundance of the target organisms. The second takes DNA and ligates a unique molecular identified (UMI) to a proportion of molecules in a sample. A few cycles of PCR then generate copies of these UMI-tagged molecules allowing accurate consensus identification, upon sequencing while retaining the crucial information about the relative proportions of molecules in the sample.
While at this stage purely a pilot study, our ultimate ambition is for this study to provide a rapid, low-cost method that can be used to identify pathogens rapidly in complex, real-world situations, where samples are often of suboptimal quality and where time to diagnosis is often critical.

Technical Summary

There is often a trade off between speed and specificity in molecular diagnostics, especially when the disease causing organism is unknown. Nanopore sequencing offers opportunities to overcome this issue through the combination of more sophisticated assignment of longer reads to databases, leveraging the fact that there are often lineage-specific regions of the genome that are either unique or shared between only a few pathogenic isolates. By maximising the information content of the genomes the use of adaptive sequencing methods could offer a 10-fold increase in sensitivity of nanopore sequencing to identify a suspected disease causing agent. The further combination of unique molecular identifiers (UMI) tags will allow faster consensus generation and potentially increase the accuracy of pathogen identification.

In our proposed research we will test all three of these developments singly and in combination using the pathogen Fusarium oxysporum as a model. Using all available high-quality genome sequences we will construct an indexed set of genomes that tracks the variability of pathogen-specific and lineage-specific regions of the genome and simulate the likely power of both adaptive sequencing and UMI tagging.

We will trial adaptive sequencing on a synthetic microbial community spiked with Fusarium of interest. This will allow us to optimise our adaptive sequencing approach. We will use a gold-standard qPCR assay to validate our results. We will then move on to testing the method using soil samples from differing environments in order to understand how DNA extraction methods affect the performance of the method. We will then explore how a combination of UMI tagging (to increase consensus accuracy) affects the adaptive sequencing process.

Taken together any single one of these are methodological advances, but in combination may lead to a step change in the application of nanopore diagnostics in the field.


10 25 50
Description Using artificial communities of mixed Fusarium species and Fusarium oxysporum f. sp. high quality gDNA, we were able to identify individual community members down to the level of f. sp. and determine relative abundance in the sample. We also showed enrichment for pathogenic isolates using Readfish adaptive sampling for pathogen specific contigs from high quality genomes.
Soil samples from fields known to be infected with a known Fusarium oxysporum f. sp. were analysed with or without the artificial community spiked into the DNA sample. Again, when using pathogen specific contigs from high quality genomes we were able to enrich for and identify pathogenic isolates. We have determined the minimum level of F. oxysporum DNA in the sample that can be identified. In soil samples, fungal DNA is present at very low levels even in highly infected soils and ~99% of DNA present is of bacterial origin. We were also able to generate data showing the background soil microbiome, which is key information for investigating the health of soils. This highlighted differences in community structure compared to Illumina 16S analysis of the same soils.
The objective of identifying Fusarium oxysporum f. sp. in soils with unknown contamination was a lot more challenging. The lack of high quality genomes with identified pathogenicity contigs was a problem and it was found that including the Fusarium oxysporum core chromosomes led to reduced enrichment and no advantage over control samples without selection running. The reason for this is under investigation but likely due to the presence of other fungal species in the soil sharing sequence homology with the core genome. This demonstrates the importance of generating high quality genomes from as many fungal pathogens as possible with well identified pathogenic regions to enable enrichment and identification.
Exploitation Route Improvements in the speed and accuracy in assessment and analysis of soil health is key to improving the productivity of sustainable crop production and food resilience by reducing disease pressure. The findings in this project have provided insights into the use of Oxford Nanopore adaptive long-read sequencing as a useful method for soil microbiome analysis. In addition, the project has highlighted the requirement for additional investment and research to enable the accurate identification of fungal pathogens i.e. the development of a public database of high quality, annotated, long read genomes from a comprehensive range of fungal pathogen isolates.
Sectors Agriculture, Food and Drink,Environment

Description Invited to talk to lab groups - University of Cambridge 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Professional Practitioners
Results and Impact Invited to talk at Uni of Cambridge Plant Sciences Dept group meeting on work related to nanopore sequencing. 25 minute talk "Applying sequencing technologies to gain insight into the entire community of organisms associated with Fusarium disease complexes".
Year(s) Of Engagement Activity 2022
Description Talk - AAB Thinking Differently about Soil-Borne Disease Management 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Association of Applied Biology day conference Thinking Differently about Soil-borne disease Management - 15 minute talk "Using Sequencing Technologies to Identify Fusarium oxysporum to ff. spp. level within the associated microbiome"
Year(s) Of Engagement Activity 2021