Open Epidemiology for pandemic modelling: a transparent, traceable, reusable, open source pipeline for reproducible science
Lead Research Organisation:
University of Glasgow
Department Name: College of Medical, Veterinary, Life Sci
Abstract
Historically, models used to support advice to government have not been publicly available, at least not readily, prior to publication. Technological advances and the growth of open source and reproducible science mean this is no longer tenable. Although current models feeding into UK policy are publicly available, they still lack the transparent and readily traceable chain of evidence connecting data and assumptions with model outputs that allows them to be readily independently assessed.
Our Data Pipeline supports the implementation of COVID-19 epidemiological models that we, the Scottish COVID-19 Response Consortium (SCRC), have developed using volunteer resources within the RAMP initiative, to create new, complementary models. The Data Pipeline fulfils a critical role in our assessment of fitness for purpose for the models in providing policy advice, by managing and documenting a chain of trust that connects the primary data, analyses, and published and unpublished literature on COVID-19 to model outputs, documenting provenance of the conclusions being reached. The software interfaces we develop will be powerful, generic tools that will be useful to any policy-oriented modelling community.
Our Data Pipeline supports the implementation of COVID-19 epidemiological models that we, the Scottish COVID-19 Response Consortium (SCRC), have developed using volunteer resources within the RAMP initiative, to create new, complementary models. The Data Pipeline fulfils a critical role in our assessment of fitness for purpose for the models in providing policy advice, by managing and documenting a chain of trust that connects the primary data, analyses, and published and unpublished literature on COVID-19 to model outputs, documenting provenance of the conclusions being reached. The software interfaces we develop will be powerful, generic tools that will be useful to any policy-oriented modelling community.
Publications
Dunne M
(2022)
Complex model calibration through emulation, a worked example for a stochastic epidemic model.
in Epidemics
Dykes J
(2022)
Visualization for epidemiological modelling: challenges, solutions, reflections and recommendations.
in Philosophical transactions. Series A, Mathematical, physical, and engineering sciences
McMonagle C
(2022)
Trends in the diversity of mortality causes and age-standardised mortality rates among subpopulations within Scotland, 2001-2019.
in SSM - population health
Mitchell SN
(2022)
FAIR data pipeline: provenance-driven data management for traceable scientific workflows.
in Philosophical transactions. Series A, Mathematical, physical, and engineering sciences
Shadbolt N
(2022)
The challenges of data in future pandemics.
in Epidemics
Description | Modern epidemiological analyses to understand and combat the spread of disease depend critically on access to, and use of, data. Rapidly evolving data, such as data streams changing during a disease outbreak, are particularly challenging. Data management is further complicated by data being imprecisely identified when used. Public trust in policy decisions resulting from such analyses is easily damaged and is often low, with cynicism arising where claims of "following the science" are made wi |
Exploitation Route | The plan to operationalise the pipeline is to integrate a suite of realistic policy-oriented models into the data pipeline. The use cases we have currently identified cover a wide range of activities likely to be carried out by identified users (including mathematical modellers, science-policy brokers, policy-makers and the wider public), will each be implemented for the integrated mathematical models, as part of a process of analysing user-software interactions and developing documented procedu |
Sectors | Environment Healthcare |
URL | https://github.com/FAIRDataPipeline |
Description | Integrating a biodiversity digital twin with a FAIR data pipeline for reproducible science |
Amount | £39,970 (GBP) |
Organisation | Natural Environment Research Council |
Sector | Public |
Country | United Kingdom |
Start | 03/2023 |
End | 07/2023 |
Title | FAIR Data Pipeline |
Description | Modern epidemiological analyses to understand and combat the spread of disease depend critically on access to, and use of, data. Rapidly evolving data, such as data streams changing during a disease outbreak, are particularly challenging. Data management is further complicated by data being imprecisely identified when used. Public trust in policy decisions resulting from such analyses is easily damaged and is often low, with cynicism arising where claims of 'following the science' are made without accompanying evidence. Tracing the provenance of such decisions back through open software to primary data would clarify this evidence, enhancing the transparency of the decision-making process. Here, we demonstrate a Findable, Accessible, Interoperable and Reusable (FAIR) data pipeline. Although developed during the COVID-19 pandemic, it allows easy annotation of any data as they are consumed by analyses, or conversely traces the provenance of scientific outputs back through the analytical or modelling source code to primary data. Such a tool provides a mechanism for the public, and fellow scientists, to better assess scientific evidence by inspecting its provenance, while allowing scientists to support policymakers in openly justifying their decisions. We believe that such tools should be promoted for use across all areas of policy-facing research. |
Type Of Material | Technology assay or reagent |
Year Produced | 2022 |
Provided To Others? | Yes |
Impact | The tool is brand new. |
URL | https://www.fairdatapipeline.org |
Description | EPIC |
Organisation | EPIC Centre of Expertise on Animal Disease Outbreaks |
Country | United Kingdom |
Sector | Charity/Non Profit |
PI Contribution | We are working with EPIC to translate our work to make it directly usable by them in the policy work over the coming years. |
Collaborator Contribution | They are dedicating time to developing use cases for the work and to further development of the software itself. |
Impact | None yet. |
Start Year | 2021 |
Title | C++ Implementation of the API for the FAIR Data Pipeline. |
Description | A c++ api to interact with the FAIR Data Pipeline |
Type Of Technology | Software |
Year Produced | 2022 |
Open Source License? | Yes |
Impact | This software, and the many other associated components of the FAIR Data Pipeline, allow traceability of research results in a FAIR manner. |
URL | https://zenodo.org/record/5877992 |
Title | DataPipeline.jl - FAIR Data Pipeline in Julia |
Description | Package for interfacing with the FAIR Data Pipeline in Julia |
Type Of Technology | Software |
Year Produced | 2021 |
Open Source License? | Yes |
Impact | This software, and the many other associated components of the FAIR Data Pipeline, allow traceability of research results in a FAIR manner. |
URL | https://zenodo.org/record/5557281 |
Title | FAIRDataPipeline/javaDataPipeline: |
Description | JAVA Implementation of the FAIR Data Pipeline API |
Type Of Technology | Software |
Year Produced | 2021 |
Impact | This software, and the many other associated components of the FAIR Data Pipeline, allow traceability of research results in a FAIR manner. |
URL | https://zenodo.org/record/5547493 |
Title | RAMP-VIS/RAMPVIS-PhilTransA-Supplement: RAMPVIS_PhilTransA_Supplement_v2.0 |
Description | The second release of the RAMPVIS PhilTransA Supplement with Observable Notebooks of the Idioms after revisions are made in response to the review comments from the first round of review. This is a repository for the collection of immutable, archived versions of the Observable Notebooks to support the RAMPVIS PhilTransA submission : "Visualization for Epidemiological Modelling: Challenges, Solutions, Reflections & Recommendations" |
Type Of Technology | Software |
Year Produced | 2022 |
Open Source License? | Yes |
URL | https://zenodo.org/record/5717366 |
Title | The FAIR Data Pipeline command line tool |
Description | Command Line Interface for the FAIR Data Pipeline system, this software provides commands necessary for integrating analysis and data processing into the FAIR registry. |
Type Of Technology | Software |
Year Produced | 2021 |
Open Source License? | Yes |
Impact | This software, and the many other associated components of the FAIR Data Pipeline, allow traceability of research results in a FAIR manner. |
URL | https://zenodo.org/record/5708045 |
Title | The FAIR Data Registry |
Description | The FAIR data registry is a Django website and REST API which is used by the data-pipeline to store metadata about code runs and their inputs and outputs. |
Type Of Technology | Software |
Year Produced | 2021 |
Open Source License? | Yes |
Impact | This software, and the many other associated components of the FAIR Data Pipeline, allow traceability of research results in a FAIR manner. |
URL | https://zenodo.org/record/5562750 |
Title | pyDataPipeline - FAIR Data Pipeline in Python |
Description | Package for interfacing with the FAIR Data Pipeline in Python |
Type Of Technology | Software |
Year Produced | 2022 |
Open Source License? | Yes |
Impact | This software, and the many other associated components of the FAIR Data Pipeline, allow traceability of research results in a FAIR manner. |
URL | https://zenodo.org/record/6010921 |
Title | rDataPipeline - FAIR Data Pipeline in R |
Description | Package for interfacing with the FAIR Data Pipeline in R |
Type Of Technology | Software |
Year Produced | 2022 |
Open Source License? | Yes |
Impact | This software, and the many other associated components of the FAIR Data Pipeline, allow traceability of research results in a FAIR manner. |
URL | https://zenodo.org/record/5921117 |
Description | Using the FAIR data pipeline to manage and maintain provenance in traceable scientific workflows - presentation to James Hutton Institute Research Symposium (24 Nov. 2022) |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Professional Practitioners |
Results and Impact | A presentation describing use of the FAIR data pipeline to manage and maintain provenance in traceable scientific workflows for an example application in the Scottish Government's Centre of Expertise on animal disease, EPIC |
Year(s) Of Engagement Activity | 2022 |