Open Epidemiology for pandemic modelling: a transparent, traceable, reusable, open source pipeline for reproducible science

Lead Research Organisation: University of Glasgow

Department Name: College of Medical, Veterinary, Life Sci

Abstract

Historically, models used to support advice to government have not been publicly available, at least not readily, prior to publication. Technological advances and the growth of open source and reproducible science mean this is no longer tenable. Although current models feeding into UK policy are publicly available, they still lack the transparent and readily traceable chain of evidence connecting data and assumptions with model outputs that allows them to be readily independently assessed.

Our Data Pipeline supports the implementation of COVID-19 epidemiological models that we, the Scottish COVID-19 Response Consortium (SCRC), have developed using volunteer resources within the RAMP initiative, to create new, complementary models. The Data Pipeline fulfils a critical role in our assessment of fitness for purpose for the models in providing policy advice, by managing and documenting a chain of trust that connects the primary data, analyses, and published and unpublished literature on COVID-19 to model outputs, documenting provenance of the conclusions being reached. The software interfaces we develop will be powerful, generic tools that will be useful to any policy-oriented modelling community.

Funded Value:

£507,560

Funded Period:

Jan 21 - Dec 22

Funder:

COVID

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

ST/V006126/1

Principal Investigator:

Richard Reeve

Research Topic:

Unclassified

Organisations

People	ORCID iD
Richard Reeve (Principal Investigator)	http://orcid.org/0000-0003-2589-8091
Glenn Marion (Co-Investigator)
Ruth Dundas (Co-Investigator)	http://orcid.org/0000-0002-3836-4286
Harriet Auty (Co-Investigator)
Louise Matthews (Co-Investigator)
Alys Brett (Co-Investigator)
Iain McKendrick (Co-Investigator)
Lisa Anne Boden (Co-Investigator)
Robert Turner (Co-Investigator)
Alejandra Noemi Gonzalez-Beltran (Co-Investigator)

Publications

Author Name

Title Publication Date Published

10 25 50

Dunne M (2022) Complex model calibration through emulation, a worked example for a stochastic epidemic model. in Epidemics

Dykes J (2022) Collection of Observable Notebooks for RAMP VIS Reflection from Visualization for epidemiological modelling: challenges, solutions, reflections and recommendations

Dykes J (2022) Visualization for epidemiological modelling: challenges, solutions, reflections and recommendations. in Philosophical transactions. Series A, Mathematical, physical, and engineering sciences

Dykes J (2022) Visualization for Epidemiological Modelling: Challenges, Solutions, Reflections & Recommendations

McMonagle C (2022) Trends in the diversity of mortality causes and age-standardised mortality rates among subpopulations within Scotland, 2001-2019. in SSM - population health

Mitchell SN (2022) FAIR data pipeline: provenance-driven data management for traceable scientific workflows. in Philosophical transactions. Series A, Mathematical, physical, and engineering sciences

Shadbolt N (2022) The challenges of data in future pandemics. in Epidemics

Key Findings
Further Funding
Research Tools and Methods
Collaboration
Software and Technical Products
Engagement Activities


Description	Modern epidemiological analyses to understand and combat the spread of disease depend critically on access to, and use of, data. Rapidly evolving data, such as data streams changing during a disease outbreak, are particularly challenging. Data management is further complicated by data being imprecisely identified when used. Public trust in policy decisions resulting from such analyses is easily damaged and is often low, with cynicism arising where claims of "following the science" are made wi
Exploitation Route	The plan to operationalise the pipeline is to integrate a suite of realistic policy-oriented models into the data pipeline. The use cases we have currently identified cover a wide range of activities likely to be carried out by identified users (including mathematical modellers, science-policy brokers, policy-makers and the wider public), will each be implemented for the integrated mathematical models, as part of a process of analysing user-software interactions and developing documented procedu
Sectors	Environment Healthcare
URL	https://github.com/FAIRDataPipeline


Description	Integrating a biodiversity digital twin with a FAIR data pipeline for reproducible science
Amount	£39,970 (GBP)
Organisation	Natural Environment Research Council
Sector	Public
Country	United Kingdom
Start	03/2023
End	07/2023


Title	FAIR Data Pipeline
Description	Modern epidemiological analyses to understand and combat the spread of disease depend critically on access to, and use of, data. Rapidly evolving data, such as data streams changing during a disease outbreak, are particularly challenging. Data management is further complicated by data being imprecisely identified when used. Public trust in policy decisions resulting from such analyses is easily damaged and is often low, with cynicism arising where claims of 'following the science' are made without accompanying evidence. Tracing the provenance of such decisions back through open software to primary data would clarify this evidence, enhancing the transparency of the decision-making process. Here, we demonstrate a Findable, Accessible, Interoperable and Reusable (FAIR) data pipeline. Although developed during the COVID-19 pandemic, it allows easy annotation of any data as they are consumed by analyses, or conversely traces the provenance of scientific outputs back through the analytical or modelling source code to primary data. Such a tool provides a mechanism for the public, and fellow scientists, to better assess scientific evidence by inspecting its provenance, while allowing scientists to support policymakers in openly justifying their decisions. We believe that such tools should be promoted for use across all areas of policy-facing research.
Type Of Material	Technology assay or reagent
Year Produced	2022
Provided To Others?	Yes
Impact	The tool is brand new.
URL	https://www.fairdatapipeline.org


Description	EPIC
Organisation	EPIC Centre of Expertise on Animal Disease Outbreaks
Country	United Kingdom
Sector	Charity/Non Profit
PI Contribution	We are working with EPIC to translate our work to make it directly usable by them in the policy work over the coming years.
Collaborator Contribution	They are dedicating time to developing use cases for the work and to further development of the software itself.
Impact	None yet.
Start Year	2021


Title	C++ Implementation of the API for the FAIR Data Pipeline.
Description	A c++ api to interact with the FAIR Data Pipeline
Type Of Technology	Software
Year Produced	2022
Open Source License?	Yes
Impact	This software, and the many other associated components of the FAIR Data Pipeline, allow traceability of research results in a FAIR manner.
URL	https://zenodo.org/record/5877992


Title	DataPipeline.jl - FAIR Data Pipeline in Julia
Description	Package for interfacing with the FAIR Data Pipeline in Julia
Type Of Technology	Software
Year Produced	2021
Open Source License?	Yes
Impact	This software, and the many other associated components of the FAIR Data Pipeline, allow traceability of research results in a FAIR manner.
URL	https://zenodo.org/record/5557281


Title	FAIRDataPipeline/javaDataPipeline:
Description	JAVA Implementation of the FAIR Data Pipeline API
Type Of Technology	Software
Year Produced	2021
Impact	This software, and the many other associated components of the FAIR Data Pipeline, allow traceability of research results in a FAIR manner.
URL	https://zenodo.org/record/5547493


Title	RAMP-VIS/RAMPVIS-PhilTransA-Supplement: RAMPVIS_PhilTransA_Supplement_v2.0
Description	The second release of the RAMPVIS PhilTransA Supplement with Observable Notebooks of the Idioms after revisions are made in response to the review comments from the first round of review. This is a repository for the collection of immutable, archived versions of the Observable Notebooks to support the RAMPVIS PhilTransA submission : "Visualization for Epidemiological Modelling: Challenges, Solutions, Reflections & Recommendations"
Type Of Technology	Software
Year Produced	2022
Open Source License?	Yes
URL	https://zenodo.org/record/5717366


Title	The FAIR Data Pipeline command line tool
Description	Command Line Interface for the FAIR Data Pipeline system, this software provides commands necessary for integrating analysis and data processing into the FAIR registry.
Type Of Technology	Software
Year Produced	2021
Open Source License?	Yes
Impact	This software, and the many other associated components of the FAIR Data Pipeline, allow traceability of research results in a FAIR manner.
URL	https://zenodo.org/record/5708045


Title	The FAIR Data Registry
Description	The FAIR data registry is a Django website and REST API which is used by the data-pipeline to store metadata about code runs and their inputs and outputs.
Type Of Technology	Software
Year Produced	2021
Open Source License?	Yes
Impact	This software, and the many other associated components of the FAIR Data Pipeline, allow traceability of research results in a FAIR manner.
URL	https://zenodo.org/record/5562750


Title	pyDataPipeline - FAIR Data Pipeline in Python
Description	Package for interfacing with the FAIR Data Pipeline in Python
Type Of Technology	Software
Year Produced	2022
Open Source License?	Yes
Impact	This software, and the many other associated components of the FAIR Data Pipeline, allow traceability of research results in a FAIR manner.
URL	https://zenodo.org/record/6010921


Title	rDataPipeline - FAIR Data Pipeline in R
Description	Package for interfacing with the FAIR Data Pipeline in R
Type Of Technology	Software
Year Produced	2022
Open Source License?	Yes
Impact	This software, and the many other associated components of the FAIR Data Pipeline, allow traceability of research results in a FAIR manner.
URL	https://zenodo.org/record/5921117


Description	Using the FAIR data pipeline to manage and maintain provenance in traceable scientific workflows - presentation to James Hutton Institute Research Symposium (24 Nov. 2022)
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Professional Practitioners
Results and Impact	A presentation describing use of the FAIR data pipeline to manage and maintain provenance in traceable scientific workflows for an example application in the Scottish Government's Centre of Expertise on animal disease, EPIC
Year(s) Of Engagement Activity	2022

Abstract

Organisations

People

ORCID iD

Publications