Open Epidemiology for pandemic modelling: a transparent, traceable, reusable, open source pipeline for reproducible science

Lead Research Organisation: University of Glasgow
Department Name: College of Medical, Veterinary &Life Sci

Abstract

Historically, models used to support advice to government have not been publicly available, at least not readily, prior to publication. Technological advances and the growth of open source and reproducible science mean this is no longer tenable. Although current models feeding into UK policy are publicly available, they still lack the transparent and readily traceable chain of evidence connecting data and assumptions with model outputs that allows them to be readily independently assessed.

Our Data Pipeline supports the implementation of COVID-19 epidemiological models that we, the Scottish COVID-19 Response Consortium (SCRC), have developed using volunteer resources within the RAMP initiative, to create new, complementary models. The Data Pipeline fulfils a critical role in our assessment of fitness for purpose for the models in providing policy advice, by managing and documenting a chain of trust that connects the primary data, analyses, and published and unpublished literature on COVID-19 to model outputs, documenting provenance of the conclusions being reached. The software interfaces we develop will be powerful, generic tools that will be useful to any policy-oriented modelling community.

Publications

10 25 50
publication icon
Mitchell SN (2022) FAIR data pipeline: provenance-driven data management for traceable scientific workflows. in Philosophical transactions. Series A, Mathematical, physical, and engineering sciences

 
Description Modern epidemiological analyses to understand and combat the spread of disease depend critically on access to, and use of, data. Rapidly evolving data, such as data streams changing during a disease outbreak, are particularly challenging. Data management is further complicated by data being imprecisely identified when used.

Public trust in policy decisions resulting from such analyses is easily damaged and is often low, with cynicism arising where claims of "following the science" are made without accompanying evidence. Tracing the provenance of such decisions back through open software to primary data would clarify this evidence, enhancing the transparency of the decision-making process.

Here, we demonstrate a Findable, Accessible, Interoperable and Reusable (FAIR) data pipeline. Although developed during the COVID-19 pandemic, it allows easy annotation of any data as they are consumed by analyses, or conversely traces the provenance of scientific outputs back through the analytical or modelling source code to primary data. Such a tool provides a mechanism for the public, and fellow scientists, to better assess scientific evidence by inspecting its provenance, while allowing scientists to support policy-makers in openly justifying their decisions. We believe that such tools should be promoted for use across all areas of policy-facing research.
Exploitation Route The plan to operationalise the pipeline is to integrate a suite of realistic policy-oriented models into the data pipeline. The use cases we have currently identified cover a wide range of activities likely to be carried out by identified users (including mathematical modellers, science-policy brokers, policy-makers and the wider public), will each be implemented for the integrated mathematical models, as part of a process of analysing user-software interactions and developing documented procedures. We would hope to involve science-policy brokers in this process; their involvement will be invaluable, in that they are a key target user group, and can also plausibly serve as proxies for policy-makers and the general public in the process. In particular, some are different inspections of data and results that they (and other individuals like members of the public) might wish to make to understand the origins of conclusions that researchers present.

Currently the data registry's web interface to address these use cases is limited, but further work is underway to improve this. Tools for provenance visualisation are also limited, and we believe further work is needed in this area to reduce the complexity of the diagrams produced, and increase their ease of use for exploration of data and results. If gaps become evident in the portfolio of use cases, these will be documented and carried forward for further attention. In the longer term we intend to pilot uptake in groups delivering model-based evidence to policy; it is likely that initial implementation and evaluation would best be carried out as part of an emergency simulation exercise, where the utility, costs and robustness of the data pipeline could be assessed within the context of the wider demands made of the scientists by policy-makers.
Sectors Environment,Healthcare

URL https://github.com/FAIRDataPipeline
 
Description EPIC 
Organisation EPIC Centre of Expertise on Animal Disease Outbreaks
Country United Kingdom 
Sector Charity/Non Profit 
PI Contribution We are working with EPIC to translate our work to make it directly usable by them in the policy work over the coming years.
Collaborator Contribution They are dedicating time to developing use cases for the work and to further development of the software itself.
Impact None yet.
Start Year 2021
 
Title C++ Implementation of the API for the FAIR Data Pipeline. 
Description A c++ api to interact with the FAIR Data Pipeline 
Type Of Technology Software 
Year Produced 2022 
Open Source License? Yes  
Impact This software, and the many other associated components of the FAIR Data Pipeline, allow traceability of research results in a FAIR manner. 
URL https://zenodo.org/record/5877992
 
Title DataPipeline.jl - FAIR Data Pipeline in Julia 
Description Package for interfacing with the FAIR Data Pipeline in Julia 
Type Of Technology Software 
Year Produced 2021 
Open Source License? Yes  
Impact This software, and the many other associated components of the FAIR Data Pipeline, allow traceability of research results in a FAIR manner. 
URL https://zenodo.org/record/5557281
 
Title FAIRDataPipeline/javaDataPipeline: 
Description JAVA Implementation of the FAIR Data Pipeline API 
Type Of Technology Software 
Year Produced 2021 
Impact This software, and the many other associated components of the FAIR Data Pipeline, allow traceability of research results in a FAIR manner. 
URL https://zenodo.org/record/5547493
 
Title The FAIR Data Pipeline command line tool 
Description Command Line Interface for the FAIR Data Pipeline system, this software provides commands necessary for integrating analysis and data processing into the FAIR registry. 
Type Of Technology Software 
Year Produced 2021 
Open Source License? Yes  
Impact This software, and the many other associated components of the FAIR Data Pipeline, allow traceability of research results in a FAIR manner. 
URL https://zenodo.org/record/5708045
 
Title The FAIR Data Registry 
Description The FAIR data registry is a Django website and REST API which is used by the data-pipeline to store metadata about code runs and their inputs and outputs. 
Type Of Technology Software 
Year Produced 2021 
Open Source License? Yes  
Impact This software, and the many other associated components of the FAIR Data Pipeline, allow traceability of research results in a FAIR manner. 
URL https://zenodo.org/record/5562750
 
Title pyDataPipeline - FAIR Data Pipeline in Python 
Description Package for interfacing with the FAIR Data Pipeline in Python 
Type Of Technology Software 
Year Produced 2022 
Open Source License? Yes  
Impact This software, and the many other associated components of the FAIR Data Pipeline, allow traceability of research results in a FAIR manner. 
URL https://zenodo.org/record/6010921
 
Title rDataPipeline - FAIR Data Pipeline in R 
Description Package for interfacing with the FAIR Data Pipeline in R 
Type Of Technology Software 
Year Produced 2022 
Open Source License? Yes  
Impact This software, and the many other associated components of the FAIR Data Pipeline, allow traceability of research results in a FAIR manner. 
URL https://zenodo.org/record/5921117