Data mining epidemiological relationships: integration of causal analysis with published evidence

Lead Research Organisation: University of Bristol

Abstract

Causal inference in epidemiology focuses on identifying the risk factors that cause disease. Established approaches focus on specific risk factors that may impact on specific diseases. However, the wealth of biomedical data that now exist enable us to assess the causal relationships between a broad network of risk factors and diseases. By considering a much wider network of such relationships we will establish the relative importance of different risk factors and the potential side-effects of interventions that target those risk factors. We will also integrate biological data (eg molecular pathways, drug targets) with causal relationships to enable us to understand the molecular mechanisms that lead to disease, and identify potential pharmaceutical and public health interventions. These data and relationships will be combined in a purpose-built “graph” database and methods will be developed to mine for novel causal risk factors and potential interventions.
The data that we collate for our research within this programme will have wide-reaching value to the research community. We will provide an open and accessible software platform for other researchers to search and use the various datasets we have integrated for their own research.

Technical Summary

Background: The increasing availability of complex, high-dimensional epidemiological data necessitates innovative and scalable approaches to harness this power to address research questions of biomedical importance.
Aims: Motivated by the widespread adoption of Mendelian randomization and the opportunities to integrate multiple data sources for the triangulation of evidence in epidemiological research, this programme will develop and apply novel data mining approaches in integrative epidemiology. We will also develop and implement a software platform to enable research questions of major epidemiological importance to be addressed rapidly and at scale.
The programme will focus on (a) integration of cutting edge statistical methods under development in the MRC Integrative Epidemiology Unit (MRC-IEU) with extensive data in a graph database; (b) development of subgraph searching algorithms; and (c) identification of causal mechanistic pathways to disease. EpiGraphDB will be a resource of extensive value to the programme, the MRC-IEU and the wider research community.
Research plans: The programme will implement a data mining approach by developing a new graph database (EpiGraphDB) that will integrate cutting edge causal analysis evidence with comprehensive data on relationships between traits, risk factors, biomarkers, intervention targets and diseases. These data will originate from Mendelian randomization, genetic and observational correlation from epidemiological studies, relationships mined from the literature, and a wide array of bioinformatics sources describing molecular relationships. EpiGraphDB will enable aetiological hypotheses to be generated and explored.
Data sharing and health applications: The database, software and results generated by this programme will be made openly available to the wider scientific community for application to a range of potential health questions (eg identifying causal risk factors for disease, identifying side-effects of interventions, etc).

Publications

10 25 50

publication icon
Battram T (2019) Appraising the causal relevance of DNA methylation for risk of lung cancer. in International journal of epidemiology

publication icon
Benedetto U (2020) Machine learning improves mortality risk prediction after cardiac surgery: Systematic review and meta-analysis. in The Journal of thoracic and cardiovascular surgery

publication icon
Benedetto U (2020) Can machine learning improve mortality prediction following cardiac surgery? in European journal of cardio-thoracic surgery : official journal of the European Association for Cardio-thoracic Surgery

 
Description MICA: NURTuRE - changing the landscape of renal medicine to foster a unified approach to stratified medicine
Amount £2,561,603 (GBP)
Funding ID MR/R013942/1 
Organisation Medical Research Council (MRC) 
Sector Public
Country United Kingdom
Start 07/2018 
End 07/2022
 
Description Turing Fellowship
Amount £9,990 (GBP)
Organisation Alan Turing Institute 
Sector Academic/University
Country United Kingdom
Start 10/2018 
End 09/2020
 
Title EpiGraphDB 
Description EpiGraphDB is a database of epidemiological relationships, including causal estimates from Mendelian randomization, genetic correlations, literature-derived relationships, and links to biological pathway data, drug targets and others. 
Type Of Material Database/Collection of data 
Year Produced 2019 
Provided To Others? No  
Impact This is due for open release in Q2 2019. The database includes pre-computed causal estimates for a wide range of risk factors on many disease phenotypes and outcomes. The risk factors include potential drug targets, and the platform is currently being used by our collaborators from the pharmaceutical industry to evaluate potential drug targets. 
URL http://www.epigraphdb.org/
 
Title GoDMC mQTL database 
Description Database of methylation quantitative trait loci (mQTL) due to be openly released on publication of the GoDMC consortium paper. 
Type Of Material Database/Collection of data 
Year Produced 2019 
Provided To Others? No  
Impact This is the largest mQTL analysis to date, providing genetic instruments for use in Mendelian randomization analyses of DNA methylation. 
URL http://mqtldb.godmc.org.uk/
 
Title IEU GWAS database 
Description This is a database of genome-wide association study data summary statistics implemented using ElasticSearch in Oracle Cloud. It was built using data originally collected and curated for the MR-Base web application (http://www.mrbase.org) 
Type Of Material Database/Collection of data 
Year Produced 2019 
Provided To Others? Yes  
Impact The new architecture of this database makes it significantly faster, supporting a much wider range and larger scale of analyses. 
URL https://gwas.mrcieu.ac.uk
 
Description Genetics of DNA Methylation Consortium 
Organisation CeMM Research Center for Molecular Medicine
Country Austria 
Sector Academic/University 
PI Contribution Contributing to a consortial analysis of methylation quantitative trait loci using data from the Accessible Resource for Integrated Epigenomics Studies (ARIES)
Collaborator Contribution Contributing to a consortial analysis of methylation quantitative trait loci using data from other studies.
Impact Multi-disciplinary collaboration involving molecular epidemiology, statistics and bioinformatics. No outputs yet (analyses ongoing)
Start Year 2013
 
Description Genetics of DNA Methylation Consortium 
Organisation King's College London
Department Brain Bank
Country United Kingdom 
Sector Academic/University 
PI Contribution Contributing to a consortial analysis of methylation quantitative trait loci using data from the Accessible Resource for Integrated Epigenomics Studies (ARIES)
Collaborator Contribution Contributing to a consortial analysis of methylation quantitative trait loci using data from other studies.
Impact Multi-disciplinary collaboration involving molecular epidemiology, statistics and bioinformatics. No outputs yet (analyses ongoing)
Start Year 2013
 
Description Genetics of DNA Methylation Consortium 
Organisation Leiden University Medical Center
Country Netherlands 
Sector Academic/University 
PI Contribution Contributing to a consortial analysis of methylation quantitative trait loci using data from the Accessible Resource for Integrated Epigenomics Studies (ARIES)
Collaborator Contribution Contributing to a consortial analysis of methylation quantitative trait loci using data from other studies.
Impact Multi-disciplinary collaboration involving molecular epidemiology, statistics and bioinformatics. No outputs yet (analyses ongoing)
Start Year 2013
 
Description Genetics of DNA Methylation Consortium 
Organisation Newcastle University
Country United Kingdom 
Sector Academic/University 
PI Contribution Contributing to a consortial analysis of methylation quantitative trait loci using data from the Accessible Resource for Integrated Epigenomics Studies (ARIES)
Collaborator Contribution Contributing to a consortial analysis of methylation quantitative trait loci using data from other studies.
Impact Multi-disciplinary collaboration involving molecular epidemiology, statistics and bioinformatics. No outputs yet (analyses ongoing)
Start Year 2013
 
Description Genetics of DNA Methylation Consortium 
Organisation University of Bristol
Country United Kingdom 
Sector Academic/University 
PI Contribution Contributing to a consortial analysis of methylation quantitative trait loci using data from the Accessible Resource for Integrated Epigenomics Studies (ARIES)
Collaborator Contribution Contributing to a consortial analysis of methylation quantitative trait loci using data from other studies.
Impact Multi-disciplinary collaboration involving molecular epidemiology, statistics and bioinformatics. No outputs yet (analyses ongoing)
Start Year 2013
 
Description Genetics of DNA Methylation Consortium 
Organisation University of Exeter
Country United Kingdom 
Sector Academic/University 
PI Contribution Contributing to a consortial analysis of methylation quantitative trait loci using data from the Accessible Resource for Integrated Epigenomics Studies (ARIES)
Collaborator Contribution Contributing to a consortial analysis of methylation quantitative trait loci using data from other studies.
Impact Multi-disciplinary collaboration involving molecular epidemiology, statistics and bioinformatics. No outputs yet (analyses ongoing)
Start Year 2013
 
Description MR-Base collaboration 
Organisation Biogen
Country United Kingdom 
Sector Private 
PI Contribution We are collaborating with GlaxoSmithKline and Biogen on the further development and enhancement of the MR-Base platform, with a particular focus on the evaluation of potential drug targets.
Collaborator Contribution The industry partners are providing scientific input on the project and advising on who to maximise translational value of the MR-Base platform.
Impact Outputs/outcomes: expansion of the database underlying MR-Base.
Start Year 2017
 
Description MR-Base collaboration 
Organisation GlaxoSmithKline (GSK)
Country Global 
Sector Private 
PI Contribution We are collaborating with GlaxoSmithKline and Biogen on the further development and enhancement of the MR-Base platform, with a particular focus on the evaluation of potential drug targets.
Collaborator Contribution The industry partners are providing scientific input on the project and advising on who to maximise translational value of the MR-Base platform.
Impact Outputs/outcomes: expansion of the database underlying MR-Base.
Start Year 2017
 
Description Oracle MR-Base collaboration 
Organisation Oracle Corporation
Department Oracle Corporation UK Ltd
Country United Kingdom 
Sector Private 
PI Contribution We implemented an ElasticSearch database in Oracle Cloud using credits provided by Oracle. We then transferred data from the IEU GWAS database into this system and connected it to the IEU GWAS database (https://gwas.mrcieu.ac.uk) for use by the wider research community.
Collaborator Contribution Oracle provided free credits and support with configuration and optimisation of a virtual cluster to support our ElasticSearch database.
Impact IEU GWAS database: https://gwas.mrcieu.ac.uk
Start Year 2018
 
Title MR-Base 
Description MR-base is a web application and R package providing a range of different methods for two-sample Mendelian randomization, and designed to be used with the IEU GWAS database 
Type Of Technology Webtool/Application 
Year Produced 2016 
Impact MR-base is being widely used by researchers to perform two-sample MR 
URL http://www.mrbase.org/
 
Title MR-Base PheWAS tool 
Description The MR-Base PheWAS tool allows users to rapidly search the associations of a SNP across all phenotypes represented in the IEU GWAS database (part of the MR-Base platform). 
Type Of Technology Webtool/Application 
Year Produced 2018 
Impact This is used by researchers as a rapid way of reviewing the associations for a single genetic variant using one of the largest public GWAS databases available. 
URL http://phewas.mrbase.org/
 
Title Vectology - exploring biomedical variable relationships using sentence embedding and vectors 
Description Many biomedical data sets contain variables that are identified by simple, and often short, descriptions. Traditionally these would either be manually annotated and/or assigned to ontologies using expert knowledge, facilitating interactions with other data sets and gaining an understanding of where these variables lie in the biomedical knowledge space. With Vectology we utilise sentence embedding methods and convert these variables into vectors, calculated from precomputed models derived from biomedical literature to infer relationships between variables. 
Type Of Technology Webtool/Application 
Year Produced 2019 
Impact The approach has been utilised in the IEU GWAS database to support identification of related datasets. 
URL http://vectology.mrcieu.ac.uk/
 
Title epigraphdb-r: An R package to use EpiGraphDB 
Description This is an R package designed to access data from EpiGraphDB (using the EpiGraphDB API) to support further analysis. 
Type Of Technology Software 
Year Produced 2019 
Open Source License? Yes  
Impact Wider accessibility to EpiGraphDB 
URL http://www.epigraphdb.org/
 
Description Presentation at ASHG in San Diego - D Baird 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Dr Denis Baird was invited to give a presentation at the annual genetics conference for the American Society of Human Genetics to communicate main findings from research into identifying the genes underlying neurological/psychiatric conditions. The presentation was entitled: Identifying the tissue-specific influence of gene expression on neurological and psychiatric traits: a Mendelian Randomization study on gene expression within the human brain.
Year(s) Of Engagement Activity 2018
 
Description Presentation: "Creating, indexing and hosting 250 billion genetic associations with Elastic" at Elastic Meetup 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Professional Practitioners
Results and Impact One of our researchers gave a presentation on our innovative use of ElasticSearch for the IEU GWAS database (https://gwas.mrcieu.ac.uk) to a Regional Elastic Meetup.
Year(s) Of Engagement Activity 2020
URL https://www.meetup.com/South-West-Elastic-Fantastics/events/265525501/