Integrating new statistical frameworks into eDNA survey and analysis at the landscape scale

Lead Research Organisation: University of Kent
Department Name: Sch of Maths Statistics & Actuarial Sci


In recent years, three major innovations have occurred in ecology. (1) The emergence of new statistical methods for analysing community data; (2) the rapid detection of species and whole communities from environmental DNA (eDNA) and bulk-sample DNA; and (3) the wide availability of remotely sensed environmental covariates. The efficiency gains are such that hundreds or even thousands of species can now be detected and, to an extent, quantified in hundreds or even thousands of samples. Collectively, these three innovations have the potential to relieve the problems of data limitation and analysis that environmental management has been struggling with, opening the way to near-real-time tracking of state and change in biodiversity and its functions and services over whole landscapes.

The aim of our project is to develop an integrated statistical framework for DNA-based surveys of biodiversity. The framework will allow the estimation of community compositions and the identification of the landscape characteristics that drive them. We will develop a Bayesian hierarchical model accounting for the probabilistic nature of DNA-based data due to observation error and taxonomic uncertainty and for model uncertainty due to the unknown strength and direction of landscape effects on the system. We will build sophisticated and efficient algorithms within a Bayesian framework for identifying the important landscape covariates that predict community structure and provide guidelines on optimal allocation of resources in DNA-based surveys for achieving the required power to infer species distributions and to link them to landscape covariates.

The huge potential contribution of DNA-based data to landscape decision-making is demonstrated by how Natural England, Local Planning Authorities, and the NatureSpace Partnership use eDNA to create a biodiversity-offset market ('District Licensing') for the protected Great Crested Newt (GCN). Water samples from 500 ponds across the South Midlands (spanning ~3320 sq km) were tested for GCN and used to create a distribution map, which was then zoned into four 'impact risk' levels. Builders pay a known, sliding-scale fee, and a portion of the fee is used to build and manage new habitat. District Licensing is only feasible with eDNA's greater efficiency. GCN District Licensing expands to at least 16 LPAs in 2020, aiming to go nationwide, which would make it the largest biodiversity-focused, land-use decision scheme in the UK, if not the world.

The natural-and highly desirable-extension to the GCN scheme would be to map 'all biodiversity' and to make land-use decisions (e.g. impact risk maps, offset markets, habitat creation) on this broader basis. In fact, samples originally collected for GCN can be repurposed for this larger goal by using 'metabarcoding,' meaning that the eDNA is PCR-amplified for a larger range of taxa. Given the District-Licensing expansion plans, pond eDNA metabarcoding alone could provide an efficient way to map biodiversity across much of the UK.

This is far from the only such programme. Ecologists in industry and academia around the world are plunging ahead with large-scale DNA-sampling campaigns, and there is, as yet, no comprehensive set of statistical methods for modelling the individual steps of the new observation processes, quantifying the resulting uncertainty, and assessing how it affects decision-making at the landscape level. Our proposed modelling framework will provide such tools by explicitly capturing measurement bias within biodiversity models as a set of observation processes, and not merely as error. Improving sampling designs and workflows as a result of our proposed models will profoundly increase the efficiency and credibility of inference and therefore reduce the risk of biodiversity loss during the political process of allocating land to different uses.

Planned Impact

Research outputs for the beneficiaries

The outputs from the research will comprise (1) new statistical models for assessing uncertainties in the different stages of eDNA workflow, from sampling through to taxonomic assignment and inference of presence and absence of species; (2) user-friendly software in R-Shiny for research users to estimate these uncertainties and the landscape-level factors that influence them; (3) tools for optimising the design and delivery of landscape-level DNA-based surveys; (4) two Knowledge Exchange workshops and training events for research users to deliver (1)-(3).

Who could potentially benefit from these research outputs over different timescales?

Immediate beneficiaries (i.e. will benefit during the course of the research and immediately after): These comprise organisations that the research team are currently working with. These organisations are providing the source data for the project and have an interest in utilizing the project outputs (see support letters). These comprise government (e.g. Natural England) and non-government (e.g. Freshwater Habitats Trust, Amphibian and Reptile Conservation) agencies; environmental consultants (e.g. ARCESL), and private-sector service providers (e.g. NatureMetrics, NatureSpace). Members of the research team serve on the boards/advisory panels for all of these organisations.
Medium to long-term beneficiaries (i.e. will benefit within 5 years of the research concluding): These include the wider community of research users operating at the landscape scale, such as private-sector service providers, planning authorities, ecological consultants and non-governmental organisations.

How might the potential beneficiaries benefit?

Immediate beneficiaries: Through board and advisory panel membership, there will be an immediate route for dissemination, implementation and review of the research via these organisations. Immediate benefits will include: improved understanding of the uncertainties involved at different stages of the workflow and where resources need targeting to address these; improved understanding of the landscape-level variables that influence uncertainty; access to free scripts and R-Shiny software to improve design and analysis DNA-based surveys at the landscape level; opportunities to attend training workshops where the project outputs will be demonstrated and disseminated. A more detailed description of the mechanisms by which these will be achieved is in the Pathways to Impact attachment.

Medium to long-term beneficiaries: The mechanisms by which these beneficiaries will be reached are twofold. Firstly, there will be direct dissemination of the findings from the research via ongoing research-user workshops being run by members of the research team. For example, in September 2019, EM, RG and ASB ran a workshop at the University of Kent for their existing network of external research users. The workshop demonstrated new software (, developed at the University of Kent and implementing novel statistical methods, developed by JG and EM, that for the first time quantifies uncertainty in single-species eDNA sampling and analysis at the landscape level. These workshops will continue at no cost to the grant through separate impact funding and ASB's joint postdoc position between Kent and the Amphibian and Reptile Conservation Trust (ARC). Secondly, there will be 'trickle down' benefits through the dissemination via the networks of the organisations working directly with the research team and/or attending the workshops. Collectively, the combination of the research team working directly with research users from different sectors and will ensure the rapid adoption of the research outcomes. Ultimately, this will result in much more cost-effective and reliable DNA sampling protocols with clear economic and societal benefits.
Description We have already published three papers, have submitted one more and are currently drafting one more. Our work so far has challenged current practices of interpreting data resulting from DNA-based surveys by demonstrating the non-negligible probabilities of false positive and false negative error when the environmental samples are collected, eg soil, water etc, and when they are analysed in the lab. We have also contested the current approach that does not require repeat samples or repeat lab analyses and demonstrated the considerable improvement in inference when effort, and hence cost, is allocated differently between the sample collection stage and the lab stage than what is currently standard practice in the field of DNA-based surveys. Our new models and corresponding R package are already being used by practitioners around the world. We have now developed a multi-species model for DNA-based surveys and have submitted the corresponding paper to JASA, which is one of the top statistics journals in the world. We have given an a workshop at the International Statistical Ecology conference in Cape Town in June and we have been awarded a Knowledge Transfer Partnership in collaboration with NatureMetrics, who are interested in incorporating our new models within their bioinformatics pipeline. Finally, we have applied for NERC's pushing the frontiers of environmental science research funding to explore the applications of our new models to studying soil microbiomes, lotic system communities and benthic communities.
Exploitation Route We have developed R code for all our new models that is freely available and we have given a number of workshops, both here and abroad to help with dissemination. We have been approached by several researchers and other organisations who are interested in using our new models and we are now collaborating with researchers in the UK and in Italy.
Sectors Agriculture, Food and Drink,Environment

Description We have been awarded a Knowledge Transfer Partnership with NatureMetrics who are interested in incorporating our new modelling framework within their bioinformatics pipeline, essentially changing their product so that it also includes output from our model. As part of this collaboration, we have developed new models and tools for study design and we have worked with organisations, such as WWF, to help them design their eDNA studies.
First Year Of Impact 2022
Sector Environment
Impact Types Societal,Economic

Description University of Kent and NatureMetrics KTP 21_22R5
Amount £99,607 (GBP)
Organisation Innovate UK 
Sector Public
Country United Kingdom
Start 08/2022 
End 09/2023
Title eDNAPlus R package 
Description We have developed a new R package and associated workshop material for fitting the models developed as part of this project 
Type Of Material Improvements to research infrastructure 
Year Produced 2022 
Provided To Others? Yes  
Impact We have had 80+ participants in our workshops and a number of those researchers are now using our R package for analysing their own data. 
Title eDNAPlus 
Description DNA-based biodiversity surveys involve collecting physical samples from survey sites and assaying the contents in the laboratory to detect species via their diagnostic DNA sequences. DNA-based surveys are increasingly being adopted for biodiversity monitoring. The most commonly employed method is metabarcoding, which combines PCR with high-throughput DNA sequencing to amplify and then read `DNA barcode' sequences. This process generates count data indicating the number of times each DNA barcode was read. However, DNA-based data are noisy and error-prone, with several sources of variation. In this paper, we present a unifying modelling framework for DNA-based data allowing for all key sources of variation and error in the data-generating process. The model can estimate within-species biomass changes across sites and link those changes to environmental covariates, while accounting for species and sites correlation. Inference is performed using MCMC, where we employ Gibbs or Metropolis-Hastings updates with Laplace approximations. We also implement a re-parameterisation scheme, appropriate for crossed-effects models, leading to improved mixing, and an adaptive approach for updating latent variables, reducing computation time. We discuss study design and present theoretical and simulation results to guide decisions on replication at different stages and on the use of quality control methods. We demonstrate the new framework on a dataset of Malaise-trap samples. We quantify the effects of elevation and distance-to-road on each species, infer species correlations, and produce maps identifying areas of high biodiversity, which can be used to rank areas by conservation value. We estimate the level of noise between sites and within sample replicates, and the probabilities of error at the PCR stage, which are close to zero for most species considered, validating the employed laboratory processing. 
Type Of Material Data analysis technique 
Year Produced 2022 
Provided To Others? Yes  
Impact Several researchers have been using this new modelling framework to analyse their data and to design their data collection approach. 
Description Workshops 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact We have delivered multiple workshops on modelling eDNA data, both in the UK and abroad. We have recorded the sessions, which are still being watched by researchers who want to learn about our new models and use our R packages for their data.
Year(s) Of Engagement Activity 2021,2022