Using data to improve public health: COVID-19 secondment

Lead Research Organisation: University of Aberdeen
Department Name: Physics

Abstract

The coronavirus disease 2019 (COVID-19) caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has led to a worldwide increase in hospitalisations and deaths since it emerged in December 2019. The effects of COVID-19 depend very much on each patient and range from asymptomatic to fatal cases. The duration of symptoms is also very heterogeneous, lasting between a few days for some patients and several weeks for others that develop the so-called 'long COVID'. Older age is a well-known risk factor for both severe and long COVID. This has been associated with a debilitated immune response caused by ageing processes. Pre-existing diseases such as hypertension, diabetes, cardiovascular disease, or cancer also increase the risk of severe infection in patients with COVID-19. However, severe COVID-19 has also been observed for many seemingly healthy middle-aged individuals. Understanding of the risk factors for severe COVID-19 remains limited and the reasons why susceptibility to the virus varies so widely in the population are poorly understood. More research is needed to unveil the biological mechanisms of severity so that highly susceptible individuals and pathways to novel treatments can be identified.
Recent studies have shown that the molecules in biofluids such as blood, urine or faeces are altered in people with cardiovascular disease, diabetes, or chronic inflammation. These conditions represent risk factors for severe COVID-19 and we hypothesise that biofluid molecules can be used as metabolic biomarkers to predict whether a patient infected by SARS-CoV-2 is likely to be seriously affected.
The central idea of the proposed research is to use metabolic biomarkers to predict the severity of COVID-19 and the likelihood of long COVID for individuals that have not necessarily been diagnosis with a pre-existing health condition. To this end, we will use pre-pandemic data from several cohort studies which, in addition to basic information on age, sex, ethnicity, etc, contain hundreds of metabolic biomarkers for thousands of individuals. To understand the link between these characteristics and the impact of COVID-19, we will use symptoms data for those individuals in the cohort studies that had COVID-19. The data will be analysed with statistical methods to identify associations between the characteristics of individuals before the pandemic and the severity of the disease. This analysis will be complemented with computer programs developed to predict if the infection of an individual will have serious effects based on his/her characteristics before the pandemic. Machine learning techniques will be used to train computer programs to automatically recognise metabolic features that represent a risk for severe COVID-19.
The project can be beneficial both in terms of basic science and applications. Indeed, the proposed research will enhance our understanding of how metabolic biomarkers may explain the susceptibility to severe COVID-19. From an applied viewpoint, using the information encoded by numerous metabolic biomarkers to train machine learning models can improve our ability to identify individuals for whom COVID-19 may have serious consequences.

Technical Summary

This project will develop computational methods to predict the severity and duration of COVID-19 using data on metabolic biomarkers from cohort studies and machine learning. Highly accurate predictions are crucial to identify the individuals that are most at risk of serious effects of COVID-19. The data to be used consists of sociodemographic information (age, sex, ethnicity, etc), information on health conditions before COVID-19, and metabolic markers from biofluids including blood, urine and faeces. Incorporating metabolomic data into the analysis is expected to significantly enhance our ability to predict the severity of COVID-19 compared to methods that focus on, e.g., sociodemographic data only.
The project will study both the severity of COVID-19 and the duration of symptoms. The specific aims of the project are the following:
Aim 1. To identify metabolic biomarkers associated with severe COVID-19 and long COVID.
Aim 2. To train computer programs to predict the susceptibility of individuals to severe COVID-19 and long COVID.
In practice, the aims will be separately addressed for the severity of COVID-19 and the duration of symptoms. The aim of the project, however, is to integrate the results for both characteristics and provide a general view on how metabolomics can help understand the manifestations of COVID-19.
The severity of COVID-19 will be quantified in terms of whether or not patients show symptoms. For Aim 1, associations between the characteristics of individuals and the presence/absence of symptoms will be explored using statistical methods which will include graphical visualisation, hypothesis testing or logistic regression. Feature selection and dimensionality reduction strategies will be used to identify relevant features in terms of symptoms. For Aim 2, machine learning models will be trained to automatically classify individuals into symptomatic and asymptomatic classes. A variety of machine learning techniques will be implemented; partial least squares discriminant analysis, support vector machines or artificial neural networks are expected to be particularly suitable to deal with the high dimensionality and correlated character of metabolomic data.
Several descriptions will be considered for the duration of symptoms which require different degrees of statistical power to be feasible. If the data gives enough statistical power, the most natural approach will be to consider the duration as a continuous random variable. In this case, Aim 1 will be fulfilled by using regression methods to assess the statistical significance of the different predictor variables for each individual. A range of machine learning methods will be explored to train a predictor for the duration of symptoms. Suitable candidates may include partial least squares regression, principal component regression or artificial neural networks. An alternative description of durations that will require less statistical power will consist in discretising the duration into several categories. For example, into short (=10 days) and long (>10 days) duration to describe short and long COVID, respectively. In this case, Aims 1 and 2 can be achieved using methods similar to those described above for the analysis of the presence or absence of symptoms.

Publications

10 25 50
 
Description Evidence to the COVID-19 Recovery Committee of the Scottish Parliament
Geographic Reach National 
Policy Influence Type Contribution to a national consultation/review
URL https://www.scottishparliament.tv/meeting/covid-19-recovery-committee-march-10-2022
 
Title A machine learning classification pipeline for metabolomic data 
Description A pipeline has been developed to identify sets of metabolites with high discriminatory power between two or more groups of individuals. This can be viewed as a method for feature selection optimized for metabolomic data. More explicitly, the method involves a feature selection procedure that results in a hierarchical sequence of models. The end of the pipeline gives a minimal model based on a small selection of metabolites with the highest power to distinguish between groups. We are applying our pipeline to distinguish groups of individuals from several cohorts that were infected by SARS-CoV-2 and had different symptom characteristics. The analysis pipeline will be published when the analyses of symptoms have been finalised. 
Type Of Material Data analysis technique 
Year Produced 2022 
Provided To Others? No  
Impact N/A 
 
Description Longitudinal Health & Wellbeing National Core Study for COVID-19 
Organisation King's College London
Country United Kingdom 
Sector Academic/University 
PI Contribution Analysis of data from several study cohorts to understand several aspects of COVID-19. In particular, I lead a project to understand the link between pre-pandemic metabolomics of COVID-19 patients and experienced symptoms. This project directly contributes to the Convalescence long COVID Study
Collaborator Contribution Provide sociodemographic and biological data for participants in the cohort studies. Also provide expertise in medical, biological and data science.
Impact No outputs are to be reported yet.
Start Year 2021
 
Description Longitudinal Health & Wellbeing National Core Study for COVID-19 
Organisation London School of Hygiene and Tropical Medicine (LSHTM)
Country United Kingdom 
Sector Academic/University 
PI Contribution Analysis of data from several study cohorts to understand several aspects of COVID-19. In particular, I lead a project to understand the link between pre-pandemic metabolomics of COVID-19 patients and experienced symptoms. This project directly contributes to the Convalescence long COVID Study
Collaborator Contribution Provide sociodemographic and biological data for participants in the cohort studies. Also provide expertise in medical, biological and data science.
Impact No outputs are to be reported yet.
Start Year 2021
 
Description Longitudinal Health & Wellbeing National Core Study for COVID-19 
Organisation University College London
Country United Kingdom 
Sector Academic/University 
PI Contribution Analysis of data from several study cohorts to understand several aspects of COVID-19. In particular, I lead a project to understand the link between pre-pandemic metabolomics of COVID-19 patients and experienced symptoms. This project directly contributes to the Convalescence long COVID Study
Collaborator Contribution Provide sociodemographic and biological data for participants in the cohort studies. Also provide expertise in medical, biological and data science.
Impact No outputs are to be reported yet.
Start Year 2021
 
Description Longitudinal Health & Wellbeing National Core Study for COVID-19 
Organisation University of Bristol
Country United Kingdom 
Sector Academic/University 
PI Contribution Analysis of data from several study cohorts to understand several aspects of COVID-19. In particular, I lead a project to understand the link between pre-pandemic metabolomics of COVID-19 patients and experienced symptoms. This project directly contributes to the Convalescence long COVID Study
Collaborator Contribution Provide sociodemographic and biological data for participants in the cohort studies. Also provide expertise in medical, biological and data science.
Impact No outputs are to be reported yet.
Start Year 2021
 
Description Longitudinal Health & Wellbeing National Core Study for COVID-19 
Organisation University of Edinburgh
Country United Kingdom 
Sector Academic/University 
PI Contribution Analysis of data from several study cohorts to understand several aspects of COVID-19. In particular, I lead a project to understand the link between pre-pandemic metabolomics of COVID-19 patients and experienced symptoms. This project directly contributes to the Convalescence long COVID Study
Collaborator Contribution Provide sociodemographic and biological data for participants in the cohort studies. Also provide expertise in medical, biological and data science.
Impact No outputs are to be reported yet.
Start Year 2021
 
Description Longitudinal Health & Wellbeing National Core Study for COVID-19 
Organisation University of Glasgow
Country United Kingdom 
Sector Academic/University 
PI Contribution Analysis of data from several study cohorts to understand several aspects of COVID-19. In particular, I lead a project to understand the link between pre-pandemic metabolomics of COVID-19 patients and experienced symptoms. This project directly contributes to the Convalescence long COVID Study
Collaborator Contribution Provide sociodemographic and biological data for participants in the cohort studies. Also provide expertise in medical, biological and data science.
Impact No outputs are to be reported yet.
Start Year 2021
 
Description Longitudinal Health & Wellbeing National Core Study for COVID-19 
Organisation University of Oxford
Country United Kingdom 
Sector Academic/University 
PI Contribution Analysis of data from several study cohorts to understand several aspects of COVID-19. In particular, I lead a project to understand the link between pre-pandemic metabolomics of COVID-19 patients and experienced symptoms. This project directly contributes to the Convalescence long COVID Study
Collaborator Contribution Provide sociodemographic and biological data for participants in the cohort studies. Also provide expertise in medical, biological and data science.
Impact No outputs are to be reported yet.
Start Year 2021