Data integration for large scale ecological models

Lead Research Organisation: NERC CEH (Up to 30.11.2019)

Department Name: Biodiversity (Wallingford)

Abstract

Ecological models are becoming larger, more complicated, and being used for an increasingly wide range of applications, from describing trends and mapping distributions to understanding mechanistic relationships and predicting the impact of future scenarios. In response, there has been a huge growth in statistical methods for large-scale ecological models. However, most such methods do not account for the fact that ecological data is inherently heterogeneous, and large datasets typically contain many forms of bias.

Recently, a set of hierarchical Bayesian models (HBMs) have emerged as promising ways for dealing with biased data, particularly for occurrence records and other unstructured data. Many millions of unstructured occurrence records exist, so the potential of these new methods is enormous.

Not all data contain biases, though. A minority of biodiversity data is highly structured in terms of the sample locations, fixed protocols and regular sampling. Ideally, we'd like to retain the information about this in our models, but combine it with the much larger sample sizes of unstructured datasets.

Integrated models provide a way to do this. They are a subclass of HBM in which data heterogeneity is modelled explicitly, by treating datasets with different observation processes as independent realisations of the same underlying state. For example, causal observations on GBIF and the Breeding Bird Survey both contain information about whether the population of a particular species was extant at a particular point in space and time.

At present, these integrated models are the preserve of highly competent statisticians. They are hard to specify and difficult to fit and diagnose. One goal of this partnership is to build an extensible framework for fitting integrated models that will make them accessible to a broad community of ecological modellers. This framework, in the form of open source tools, will make it easier for ecologists to handle biased data when addressing large-scale questions about biodiversity.

Although attractive from a conceptual standpoint, it is unclear whether the sophistication of integrated models deliver real benefits over simple ones. In particular there is an urgent need for some general principles about how to proceed when both structured and unstructured data sources are available. Critical questions include:
Q1. When and how should we combine datasets with different properties?
Q2. Under what circumstances is simple aggregation (i.e. ignoring the different observation processes) better than integration?
Q3. If we suspect the data contain biases, can we detect them and handle them adequately?
Q4. What are the most appropriate metrics for information content and model fit?

These general questions lie at the intersection of the research interests of PI Isaac, Co-I Henrys and Project Partner O'Hara. Each has made some progress towards addressing specific aspects of these questions. Working in partnership would add significant value to each, by taking existing research beyond the specific context and toward general answers to these big questions. It would permit a co-ordinated effort and build a work program of international significance.

This pump-priming award would provide a platform for this partnership. The overall aim is to build a framework for inference in large-scale models of species' distribution, and to test it using computer simulations.

Planned Impact

Three types of non-academic activity will benefit from the research described in this proposal.

1. Design of biodiversity monitoring programs
Large-scale structured biodiversity monitoring is expensive: ensuring cost-effective survey design is a major priority for the agencies that commission such research, particularly in the current climate of government austerity. In the UK, the principle agencies are JNCC, Defra, Natural England, Scottish Natural Heritage etc. These agencies are attracted to the potential of citizen science and opportunistic recording to generate large volumes of data at relatively low cost. However, the value of these data types is questionable, due to the lack of structure in how data are collected. Integrated modelling provides a way to combine these unstructured data with more traditional structured surveys. To some extent this has already happened: the new Defra-funded Pollinator Monitoring Scheme has been designed with elements of structured and unstructured data observation processes, using a mixture of professional surveyors and citizen scientists. The 'rules of thumb' arising from this project will make it possible design cost-effective biodiversity surveillance schemes in which stratified random sampling using formal protocols by professional scientists can be augmented by large-scale observations by citizen scientists (and vice versa). In this way, the outcomes of this research project will support decisions about future scheme designs, both in the UK and internationally.

2. Reporting on biodiversity targets
Biodiversity indicators are a key tool for reporting against national targets and international treaty obligations (including the "Aichi targets"). Currently, the UK has eleven biodiversity indicators that report on the status of species, of which nine use data from structured surveys and two use unstructured occurrence records (both new indicators were developed by the project team). For many species there are multiple datasets available, but there is no obvious way to combine them. By providing clear guidance on data integration, our research will ensure that biodiversity indicators make the best use of available data and prevent arbitrary choices are being made about which data to use. This will be a welcome development for agencies with responsibility for delivering biodiversity indicators, both in the UK (JNCC and Defra) and internationally.

3. International Networks
The biodiversity crisis is a global phenomenon, and biodiversity itself does not respect international borders. For this reason, there is a need to coordinate responses to the biodiversity crisis across nations, in which IPBES and GEO-BON have an important role to play. Part of this role involves synthesising large quantities of information about biodiversity from different countries. Data integration, and the development of models that facilitate such integration, is necessary to form a coherent narrative at the global scale. The development of Essential Biodiversity Variables (EBVs), led by GEO-BON, is one way in which synthesis can be achieved. PI Isaac recently contributed to the design of a roadmap for building EBVs for species' distribution and abundance at the global scale, in which integrated models (to account for data heterogeneity) were highlighted as a key knowledge gap.

Funded Value:

£32,110

Funded Period:

Dec 17 - Nov 19

Funder:

NERC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

NE/R005133/1

Principal Investigator:

Nicholas Isaac

Research Subject:

Ecol, biodivers. & systematics (100%)

Research Topic:

Community Ecology (20%)

Conservation Ecology (50%)

Population Ecology (30%)

Organisations

People	ORCID iD
Nicholas Isaac (Principal Investigator)
Peter Henrys (Co-Investigator)

Publications

Author Name

Title Publication Date Published

10 25 50

Adjei K (2023) Integrating data from different taxonomic resolutions to better estimate community alpha diversity

Ahmad Suhaimi S (2021) Integrated species distribution models: A comparison of approaches under different data quality scenarios in Diversity and Distributions

Dambly L (2023) Integrated species distribution models fitted in INLA are sensitive to mesh parameterisation in Ecography

Isaac NJB (2020) Data Integration for Large-Scale Models of Species Distributions. in Trends in ecology & evolution

Jönsson G (2021) A century of social wasp occupancy trends from natural history collections: spatiotemporal resolutions have little effect on model performance in Insect Conservation and Diversity

Sheard JK (2021) Long-term trends in the occupancy of ants revealed through use of multi-sourced datasets. in Biology letters

Simmonds E (2020) Is more data always better? A simulation study of benefits and limitations of integrated distribution models in Ecography

Key Findings
Impact Summary
Policy Influence
Further Funding
Collaboration
Software and Technical Products
Engagement Activities


Description	Integrated modelling of species distributions and abundance is emerging as a powerful tool in statistical ecology, and are expected to underpin the next generation of models predicting the current, future and potential distributions of species. Point processes provide a flexible framework for developing integrated models, combining data representing the locations of individual organisms, local population abundance and species-site occupancy. In this project, we developed methods that provide opportunities to make best use of existing and new data sources. We assessed the value of data integration over conventional approaches, and evaluated, using simulations, the situations when data integration is likely to be beneficial.
Exploitation Route	Integrated models are currently the preserve of statisticians. Our work makes these developments accessible to a broad set of non-specialists. Research conducted in this project underpins a series of integrated modelling frameworks and new schemes for monitoring the state of the environment.
Sectors	Environment


Description	Stakeholders in government agencies and NGOs, both in UK and internationally, are using the insights from this work to develop new wildlife monitoring schemes and data storage architectures. Two examples are 1) JNCC commissioned a User Guide to inform on the potential for using integrated models for designing wildlife surveillance, and 2) the nascent EU Pollinator Monitoring scheme is built on an integrated modelling platform developed in this project.
First Year Of Impact	2021
Sector	Environment
Impact Types	Policy & public services


Description	Integrated modelling framework adopted by EU pollinator monitoring scheme
Geographic Reach	Europe
Policy Influence Type	Participation in a guidance/advisory committee
URL	https://ec.europa.eu/jrc/en/science-update/proposal-eu-pollinator-monitoring-scheme-eu-poms


Description	ARIES DTP studentship
Amount	£75,000 (GBP)
Organisation	Natural Environment Research Council
Sector	Public
Country	United Kingdom
Start	10/2020
End	03/2024


Description	GLobal Insect Threat-Response Synthesis (GLiTRS): a comprehensive and predictive assessment of the pattern and consequences of insect declines
Amount	£902,701 (GBP)
Funding ID	NE/V007548/1
Organisation	Natural Environment Research Council
Sector	Public
Country	United Kingdom
Start	11/2020
End	11/2024


Description	Knowledge Exchange Fellowship: Bringing the data revolution to nature recovery
Amount	£170,758 (GBP)
Funding ID	NE/V018973/1
Organisation	Natural Environment Research Council
Sector	Public
Country	United Kingdom
Start	10/2021
End	10/2024


Description	Collaboration with Univ Glasgow
Organisation	University of Glasgow
Country	United Kingdom
Sector	Academic/University
PI Contribution	Arising from our publications on this project, I was approached by researchers at University of Glasgow about developing new statistical models for integrating datasets. For the first piece of work, we have provided advice on data structure and feedback on the models that have been developed. A manuscript is in preparation.
Collaborator Contribution	The work has been led by the Glasgow team, who are based in the Dept Statistics. They've built the models and tested them using data that we provided. The Glasgow team is leading on the writing of papers.
Impact	Manuscript in prep
Start Year	2022


Description	EU Pollinator Monitoring scheme
Organisation	European Commission
Country	European Union (EU)
Sector	Public
PI Contribution	Informed by the advances in this project, we developed an integrated modelling framework to model change in the abundance and distribution of pollinators across Europe.
Collaborator Contribution	Access to expertise in survey design and pollinator ecology from across Europe
Impact	Design of a pan-European monitoring scheme built on integration of multiple evidence streams: https://ec.europa.eu/jrc/en/science-update/proposal-eu-pollinator-monitoring-scheme-eu-poms
Start Year	2020


Description	Partnership with Prof O'Hara at NTNU
Organisation	Norwegian University of Science and Technology (NTNU)
Country	Norway
Sector	Academic/University
PI Contribution	I hosted a workshop over three days in July 2019 comprising six members of my research team, our partners at the Norwegian Technical University (NTNU), and ten external academics. At the workshop we scoped out a draft manuscript reviewing the literature on this emergent topic. I co-ordinated the writing process over the months that followed. We had a follow-up "virtual workshop" when I visited NTNU with one of my research team, in November. During this visit we worked on the manuscript and planned the next phase of work under this partnership. Since then each organisation has hosted visits of PhD students from the other. Overall, our team provides knowledge of the data and the NTNU team provides the expert statistical knowledge.
Collaborator Contribution	Bob O'Hara at the Norwegian Technical University is a named project partner on the grant application. Prof O'Hara supported me throughout the planning and execution of the workshop, as well as during the writing of the manuscript, during which we co-ordinated the activities of the external academics as a writing team. O'Hara has allocated a portion of his Postdoctoral researcher's time to work with a member of my team on this project. The manuscript is now completed and in peer review, whilst the next phase of work continues. Subsequently, we have been working with one of O'Hara's PhD students from the Maths department.
Impact	Papers in Trends in Ecology & Evolution, Ecography and Diversity and Distributions. Further manuscripts in preparation.
Start Year	2019


Description	Pollinator Monitoring Scheme
Organisation	Department For Environment, Food And Rural Affairs (DEFRA)
Country	United Kingdom
Sector	Public
PI Contribution	The pollinator monitoring scheme is a new program for monitoring the status of pollinating insects across the UK. My research team is responsible for developing the statistical modelling framework for reporting trends for each species. We have based this framework on insights gained from the "Data integration" project: there is a systematic survey and an extensive unstructured recording scheme, both of which generate valuable information.
Collaborator Contribution	Taxonomic expertise; Survey design; coordination; access to networks of volunteer citizen scientists; communications
Impact	Cost-benefit analysis showing that pollinator monitoring more than pays for itself: https://onlinelibrary.wiley.com/doi/10.1111/1365-2664.13755 Manuscripts in preparation
Start Year	2018


Title	Integrated analysis of black-throated blue warbler data from PA, USA
Description	This product is R code to run an integrated distribution model in R-INLA, using the principles developed under this project and which are described in our paper in Trends in Ecology & Evolution. This worked example contains all the steps required to download the data, fit the model and display the outputs.
Type Of Technology	Software
Year Produced	2019
Open Source License?	Yes
Impact	The code developed here is a case study in the paper we published in early 2020 (see "publications")
URL	https://zenodo.org/record/3363936#.XUw_F-NKhFE


Description	Presented at Living Norway symposium
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	The Living Norway seminar ran over two days in June 2019 as the launch even the Living Norway Ecological Data Network. The network spans academics, government agencies and volunteer groups in Norway. The meeting was also attended by data holders from other Scandinavian countries. In my talk, I explained how the principles of data integration work in an ecological context, showing how well-designed models and databases make it possible to bring multiple data types to bear on questions about how biodiversity is distributed
Year(s) Of Engagement Activity	2019
URL	https://livingnorway.no/2019/04/26/living-norway-seminar-2019/

Abstract

Planned Impact

Organisations

People

ORCID iD

Publications