Open Epidemiology for pandemic modelling: a transparent, traceable, reusable, open source pipeline for reproducible science

Lead Research Organisation: University of Glasgow
Department Name: College of Medical, Veterinary, Life Sci

Abstract

Historically, models used to support advice to government have not been publicly available, at least not readily, prior to publication. Technological advances and the growth of open source and reproducible science mean this is no longer tenable. Although current models feeding into UK policy are publicly available, they still lack the transparent and readily traceable chain of evidence connecting data and assumptions with model outputs that allows them to be readily independently assessed.

Our Data Pipeline supports the implementation of COVID-19 epidemiological models that we, the Scottish COVID-19 Response Consortium (SCRC), have developed using volunteer resources within the RAMP initiative, to create new, complementary models. The Data Pipeline fulfils a critical role in our assessment of fitness for purpose for the models in providing policy advice, by managing and documenting a chain of trust that connects the primary data, analyses, and published and unpublished literature on COVID-19 to model outputs, documenting provenance of the conclusions being reached. The software interfaces we develop will be powerful, generic tools that will be useful to any policy-oriented modelling community.

Publications

10 25 50
publication icon
Shadbolt N (2022) The challenges of data in future pandemics. in Epidemics

publication icon
Dykes J (2022) Visualization for epidemiological modelling: challenges, solutions, reflections and recommendations. in Philosophical transactions. Series A, Mathematical, physical, and engineering sciences

publication icon
Mitchell SN (2022) FAIR data pipeline: provenance-driven data management for traceable scientific workflows. in Philosophical transactions. Series A, Mathematical, physical, and engineering sciences

 
Description Modern epidemiological analyses to understand and combat the spread of disease depend critically on access to, and use of, data. Rapidly evolving data, such as data streams changing during a disease outbreak, are particularly challenging. Data management is further complicated by data being imprecisely identified when used.

Public trust in policy decisions resulting from such analyses is easily damaged and is often low, with cynicism arising where claims of "following the science" are made wi
Exploitation Route The plan to operationalise the pipeline is to integrate a suite of realistic policy-oriented models into the data pipeline. The use cases we have currently identified cover a wide range of activities likely to be carried out by identified users (including mathematical modellers, science-policy brokers, policy-makers and the wider public), will each be implemented for the integrated mathematical models, as part of a process of analysing user-software interactions and developing documented procedu
Sectors Environment,Healthcare

URL https://github.com/FAIRDataPipeline
 
Description Integrating a biodiversity digital twin with a FAIR data pipeline for reproducible science
Amount £39,970 (GBP)
Organisation Natural Environment Research Council 
Sector Public
Country United Kingdom
Start 03/2023 
End 07/2023
 
Title FAIR Data Pipeline 
Description Modern epidemiological analyses to understand and combat the spread of disease depend critically on access to, and use of, data. Rapidly evolving data, such as data streams changing during a disease outbreak, are particularly challenging. Data management is further complicated by data being imprecisely identified when used. Public trust in policy decisions resulting from such analyses is easily damaged and is often low, with cynicism arising where claims of 'following the science' are made without accompanying evidence. Tracing the provenance of such decisions back through open software to primary data would clarify this evidence, enhancing the transparency of the decision-making process. Here, we demonstrate a Findable, Accessible, Interoperable and Reusable (FAIR) data pipeline. Although developed during the COVID-19 pandemic, it allows easy annotation of any data as they are consumed by analyses, or conversely traces the provenance of scientific outputs back through the analytical or modelling source code to primary data. Such a tool provides a mechanism for the public, and fellow scientists, to better assess scientific evidence by inspecting its provenance, while allowing scientists to support policymakers in openly justifying their decisions. We believe that such tools should be promoted for use across all areas of policy-facing research. 
Type Of Material Technology assay or reagent 
Year Produced 2022 
Provided To Others? Yes  
Impact The tool is brand new. 
URL https://www.fairdatapipeline.org
 
Description EPIC 
Organisation EPIC Centre of Expertise on Animal Disease Outbreaks
Country United Kingdom 
Sector Charity/Non Profit 
PI Contribution We are working with EPIC to translate our work to make it directly usable by them in the policy work over the coming years.
Collaborator Contribution They are dedicating time to developing use cases for the work and to further development of the software itself.
Impact None yet.
Start Year 2021
 
Title C++ Implementation of the API for the FAIR Data Pipeline. 
Description A c++ api to interact with the FAIR Data Pipeline 
Type Of Technology Software 
Year Produced 2022 
Open Source License? Yes  
Impact This software, and the many other associated components of the FAIR Data Pipeline, allow traceability of research results in a FAIR manner. 
URL https://zenodo.org/record/5877992
 
Title DataPipeline.jl - FAIR Data Pipeline in Julia 
Description Package for interfacing with the FAIR Data Pipeline in Julia 
Type Of Technology Software 
Year Produced 2021 
Open Source License? Yes  
Impact This software, and the many other associated components of the FAIR Data Pipeline, allow traceability of research results in a FAIR manner. 
URL https://zenodo.org/record/5557281
 
Title FAIRDataPipeline/javaDataPipeline: 
Description JAVA Implementation of the FAIR Data Pipeline API 
Type Of Technology Software 
Year Produced 2021 
Impact This software, and the many other associated components of the FAIR Data Pipeline, allow traceability of research results in a FAIR manner. 
URL https://zenodo.org/record/5547493
 
Title The FAIR Data Pipeline command line tool 
Description Command Line Interface for the FAIR Data Pipeline system, this software provides commands necessary for integrating analysis and data processing into the FAIR registry. 
Type Of Technology Software 
Year Produced 2021 
Open Source License? Yes  
Impact This software, and the many other associated components of the FAIR Data Pipeline, allow traceability of research results in a FAIR manner. 
URL https://zenodo.org/record/5708045
 
Title The FAIR Data Registry 
Description The FAIR data registry is a Django website and REST API which is used by the data-pipeline to store metadata about code runs and their inputs and outputs. 
Type Of Technology Software 
Year Produced 2021 
Open Source License? Yes  
Impact This software, and the many other associated components of the FAIR Data Pipeline, allow traceability of research results in a FAIR manner. 
URL https://zenodo.org/record/5562750
 
Title pyDataPipeline - FAIR Data Pipeline in Python 
Description Package for interfacing with the FAIR Data Pipeline in Python 
Type Of Technology Software 
Year Produced 2022 
Open Source License? Yes  
Impact This software, and the many other associated components of the FAIR Data Pipeline, allow traceability of research results in a FAIR manner. 
URL https://zenodo.org/record/6010921
 
Title rDataPipeline - FAIR Data Pipeline in R 
Description Package for interfacing with the FAIR Data Pipeline in R 
Type Of Technology Software 
Year Produced 2022 
Open Source License? Yes  
Impact This software, and the many other associated components of the FAIR Data Pipeline, allow traceability of research results in a FAIR manner. 
URL https://zenodo.org/record/5921117
 
Description Using the FAIR data pipeline to manage and maintain provenance in traceable scientific workflows - presentation to James Hutton Institute Research Symposium (24 Nov. 2022) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact A presentation describing use of the FAIR data pipeline to manage and maintain provenance in traceable scientific workflows for an example application in the Scottish Government's Centre of Expertise on animal disease, EPIC
Year(s) Of Engagement Activity 2022