A Multimodal COVID-19 Database for Research

Lead Research Organisation: University of Oxford
Department Name: Engineering Science

Abstract

The proposed project addresses one of the key UKRI priority areas of preparing data sets to defined quality standards by delivering a Multimodal Database for COVID-19 Research, a comprehensive and easy-to-use database for creating and validating epidemiological models.

Modelling, machine learning, digital and data approaches to understanding the COVID-19 pandemic will shape policy decisions over the coming year. Such research requires rich and standardised data at a fine geographical level and of multiple modalities: epidemiological, mobility, socioeconomic and more. Large quantities of COVID-19 data are collected both in the UK and around the world. However, sourcing and linking data of different modalities are major burdens for researchers, owing to the lack of standardisation. There is a clear need to establish a central repository to facilitate world-class research immediately and in coming years.

Building on our extensive voluntary work on the OxCOVID19 Database (https://covid19.eng.ox.ac.uk/), we seek funding to expand our global coverage, deepen our focus on the UK, design new interfaces for diverse users, develop stronger infrastructure for increasing demand, and grow our user numbers. This database will enable data linkage for research, delivering consolidated, well-formatted data in a way that avoids duplication of effort by multiple research groups and accelerates research by removing barriers to entry.

Publications

10 25 50
 
Description The data from the project was used to provide one of the first classifications of the COVID-19 pandemic wave typology and how those were affected by non-pharmaceutical interventions (such as lockdowns). This will work is to be published in J Harvey et al. (2023). "Epidemiological waves-types, drivers and modulators in the COVID-19 pandemic." (https://www.medrxiv.org/content/10.1101/2022.01.07.21268513v1).

A large amount of COVID19 data have been collected and available for future research. Data available at https://github.com/covid19db.

The project showed how multimodal databases for epidemiological research can be assembled using an advanced engineering pipeline. OxCOVID19 was able to provide highly granular geographic epidemiological data to discuss questions of importance to UK society.
Exploitation Route The project has produced a data acquisition pipeline for future pandemics. It has provided a framework for building real-time, multimodal databases for epidemiological research aimed at future pandemics. The framework, which is available at https://github.com/covid19db/fetchers-python for future use, is written in Python and includes the following stages: (1) Fetching - acquiring raw data from multiple sources, (2) Unification - associating the data with geographical regions, (3) Validation - verifying the data for accuracy and consistency, (4) Storing - storing the data in a relational database (such as PostgreSQL) for faster query times, standardised syntax, and multiple data views, and (5) Sharing - publishing CSV files (currently a common method for sharing data) and an API (to enhance task automation and integration).
Sectors Healthcare

URL https://covid19.oii.ox.ac.uk/database/
 
Description The project aimed to prepare and link multimodal data sets for COVID-19 research in a standardised manner, with a particular focus on the UK. To standardise the data, metadata was checked to ensure consistent definition of variables, and records were accurately linked to the correct geographical region. Over 70 data acquisition modules were developed, a standardised format for storing data was implemented including geographic division, and distributed storage using GitHub was implemented. The collected data is currently stored in the data repository on GitHub [https://github.com/covid19db]. Oxford students on the Social Data Science MSc, several of whom are sponsored by industry, have to complete an assignment that combines and analyses OxCOVID19 data. Some of this work has been distributed outside academia, see for example the blog from Cameron Raymon (now with Open AI) [https://cameronraymond.me/blog/covid-music/]. The database was also identified as an example of good engineering and promoted through other technology outlets, for example Splitgraph: see Numark, P. (2022) "Planning a vacation with Splitgraph and Observable" Available online: https://www.splitgraph.com/blog/observable-query-oxford-covid.
Sector Healthcare
Impact Types Policy & public services

 
Description A new public-facing web page to facilitate data sharing and engagement 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact In response to engagement with various database users from outside academia, a paper was co-produced by the database team, other colleagues and database users which made use of our data to analyse the question of what a "wave" of COVID-19 is. This paper is available on medRxiv (https://www.medrxiv.org/content/10.1101/2022.01.07.21268513v1) and has been submitted to Epidemics where it is currently under consideration. A new public-facing web page has been built to facilitate data sharing and engagement, which will go live by the end of March. Additionally, topics focusing on the efficacy of non-pharmaceutical interventions have been identified and several reports drafted, which will be a new feature in the newly built website.
Year(s) Of Engagement Activity 2021,2022