Data mining epidemiological relationships: integration of causal analysis with published evidence

Lead Research Organisation: University of Bristol

Abstract

Causal inference in epidemiology focuses on identifying the risk factors that cause disease. Established approaches focus on specific risk factors that may impact on specific diseases. However, the wealth of biomedical data that now exist enable us to assess the causal relationships between a broad network of risk factors and diseases. By considering a much wider network of such relationships we will establish the relative importance of different risk factors and the potential side-effects of interventions that target those risk factors. We will also integrate biological data (eg molecular pathways, drug targets) with causal relationships to enable us to understand the molecular mechanisms that lead to disease, and identify potential pharmaceutical and public health interventions. These data and relationships will be combined in a purpose-built “graph” database and methods will be developed to mine for novel causal risk factors and potential interventions.
The data that we collate for our research within this programme will have wide-reaching value to the research community. We will provide an open and accessible software platform for other researchers to search and use the various datasets we have integrated for their own research.

Technical Summary

Background: The increasing availability of complex, high-dimensional epidemiological data necessitates innovative and scalable approaches to harness this power to address research questions of biomedical importance.
Aims: Motivated by the widespread adoption of Mendelian randomization and the opportunities to integrate multiple data sources for the triangulation of evidence in epidemiological research, this programme will develop and apply novel data mining approaches in integrative epidemiology. We will also develop and implement a software platform to enable research questions of major epidemiological importance to be addressed rapidly and at scale.
The programme will focus on (a) integration of cutting edge statistical methods under development in the MRC Integrative Epidemiology Unit (MRC-IEU) with extensive data in a graph database; (b) development of subgraph searching algorithms; and (c) identification of causal mechanistic pathways to disease. EpiGraphDB will be a resource of extensive value to the programme, the MRC-IEU and the wider research community.
Research plans: The programme will implement a data mining approach by developing a new graph database (EpiGraphDB) that will integrate cutting edge causal analysis evidence with comprehensive data on relationships between traits, risk factors, biomarkers, intervention targets and diseases. These data will originate from Mendelian randomization, genetic and observational correlation from epidemiological studies, relationships mined from the literature, and a wide array of bioinformatics sources describing molecular relationships. EpiGraphDB will enable aetiological hypotheses to be generated and explored.
Data sharing and health applications: The database, software and results generated by this programme will be made openly available to the wider scientific community for application to a range of potential health questions (eg identifying causal risk factors for disease, identifying side-effects of interventions, etc).

Publications

10 25 50

publication icon
Battram T (2022) The EWAS Catalog: a database of epigenome-wide association studies in Wellcome Open Research

 
Title Reducing drug development costs (animation) 
Description This short animation explains how we use Mendelian randomization and colocalization to help prioritise drug targets. 
Type Of Art Film/Video/Animation 
Year Produced 2020 
Impact N/A 
URL https://youtu.be/t77LZZlF4iw
 
Description Academy of Medical Sciences Springboard Award
Amount £99,997 (GBP)
Funding ID SBF006\1117 
Organisation Academy of Medical Sciences (AMS) 
Sector Charity/Non Profit
Country United Kingdom
Start 07/2021 
End 07/2023
 
Description BHF 4-year PhD programme in Integrated Cardiovascular Science
Amount £1,439,856 (GBP)
Organisation British Heart Foundation (BHF) 
Sector Charity/Non Profit
Country United Kingdom
Start 09/2021 
End 09/2028
 
Description Biogen collaboration on MR-Base
Amount £263,667 (GBP)
Organisation Biogen Idec 
Sector Private
Country United States
Start 03/2021 
End 03/2023
 
Description CRUK Integrative Cancer Epidemiology Programme
Amount £7,715,113 (GBP)
Funding ID C18281/A29019 
Organisation Cancer Research UK 
Sector Charity/Non Profit
Country United Kingdom
Start 09/2020 
End 09/2025
 
Description CSC - Bristol PhD studentship
Amount £150,400 (GBP)
Funding ID 202008320304 
Organisation Chinese Scholarship Council 
Sector Charity/Non Profit
Country China
Start 09/2020 
End 09/2024
 
Description Developing cross-population Mendelian randomization for generalizing evidence on drug targets: the MRC Cross- Population Mendelian Randomization Network
Amount £118,937 (GBP)
Funding ID MC_PC_21018 
Organisation Medical Research Council (MRC) 
Sector Public
Country United Kingdom
Start 01/2022 
End 12/2023
 
Description Joint MRC units workshop integrating genetics with g target prioritisation
Amount £68,000 (GBP)
Funding ID MC_PC_20042 
Organisation Medical Research Council (MRC) 
Sector Public
Country United Kingdom
Start 01/2021 
End 03/2021
 
Description MICA: NURTuRE - changing the landscape of renal medicine to foster a unified approach to stratified medicine
Amount £2,561,603 (GBP)
Funding ID MR/R013942/1 
Organisation Medical Research Council (MRC) 
Sector Public
Country United Kingdom
Start 06/2018 
End 07/2022
 
Description Molecular Genetic and Lifecourse Epidemiology
Amount £5,153,712 (GBP)
Funding ID 218495/Z/19/Z 
Organisation Wellcome Trust 
Sector Charity/Non Profit
Country United Kingdom
Start 09/2020 
End 09/2028
 
Description Turing Fellowship
Amount £9,990 (GBP)
Organisation Alan Turing Institute 
Sector Academic/University
Country United Kingdom
Start 09/2018 
End 09/2020
 
Title EpiGraphDB 
Description EpiGraphDB is a database of epidemiological relationships, including causal estimates from Mendelian randomization, genetic correlations, literature-derived relationships, and links to biological pathway data, drug targets and others. 
Type Of Material Database/Collection of data 
Year Produced 2019 
Provided To Others? No  
Impact This is due for open release in Q2 2019. The database includes pre-computed causal estimates for a wide range of risk factors on many disease phenotypes and outcomes. The risk factors include potential drug targets, and the platform is currently being used by our collaborators from the pharmaceutical industry to evaluate potential drug targets. 
URL http://www.epigraphdb.org/
 
Title GoDMC mQTL database 
Description Database of methylation quantitative trait loci (mQTL) due to be openly released on publication of the GoDMC consortium paper. 
Type Of Material Database/Collection of data 
Year Produced 2019 
Provided To Others? No  
Impact This is the largest mQTL analysis to date, providing genetic instruments for use in Mendelian randomization analyses of DNA methylation. 
URL http://mqtldb.godmc.org.uk/
 
Title IEU OpenGWAS database 
Description This is a database of genome-wide association study data summary statistics implemented using ElasticSearch in Oracle Cloud. It was built using data originally collected and curated for the MR-Base web application (http://www.mrbase.org) 
Type Of Material Database/Collection of data 
Year Produced 2019 
Provided To Others? Yes  
Impact The new architecture of this database makes it significantly faster, supporting a much wider range and larger scale of analyses. 
URL https://gwas.mrcieu.ac.uk
 
Description Biogen collaboration 
Organisation Biogen Idec
Country United States 
Sector Private 
PI Contribution We are continuing a previous collaboration with Biogen focused on drug target prioritization using Mendelian randomization and genetic colocalization with molecular QTL datasets.
Collaborator Contribution Biogen are providing funding, pharmaceutical expertise and datasets relevant to their target areas.
Impact N/A
Start Year 2021
 
Description Bristol - Peking University First Hospital collaboration 
Organisation Peking University First Hospital
Country China 
Sector Hospitals 
PI Contribution Jointly working on kidney related projects and published papers. Provide supports for data, methods and way to explain the results for the projects. Leading on external and internal funding applications. Support member of Peking University First Hospital, Dr Yuemiao Zhang, to orally present the collaborative work in the International Mendelian randomization conference 2019 Support member of Peking University First Hospital, Dr Yuemiao Zhang, to been an invited speaker for the International Mendelian randomization conference 2021
Collaborator Contribution Jointly working on kidney related projects and published papers. Provide clinical point of view for the project design and processes. Support research visit for Bristol member Dr Jie Zheng to visit Peking University First Hospital in 2019. Contribute to external and internal funding applications.
Impact The collaboration created two grant application opportunities: 1. we successfully received an internal grant from BBSRC/Bristol to support our collaborative activities to study causal role of diabetes on kidney disease. 2. we applied a UK-China joint grant between Bristol and Peking University First Hospital but not received the funding at the end. Our collaboration also created some joint publications, which including one paper been published and two papers been submitted. Y Zhang, J Zheng, TR Gaunt, H Zhang. Mendelian randomization analysis reveals a causal effect of urinary sodium/urinary creatinine ratio on kidney function in Europeans. Frontiers in Bioengineering and Biotechnology 8, 662 J Zheng, Y Zhang, H Rasheed, V Walker, Y Sugawara, J Li, Y Leng, et al. Trans-ethnic Mendelian randomization study reveals causal relationships between cardio-metabolic factors and chronic kidney disease. medRxiv (under revision in Circulation) J Zheng, Y Zhang, Y Liu, D Baird, X Liu, L Wang, H Zhang, et al Multi-omics study revealing tissue-dependent putative mechanisms of SARS-CoV-2 drug targets on viral infections and complex diseases. medRxiv (under revision in Human Molecular Genetics)
Start Year 2021
 
Description CHS proteome MR working group 
Organisation University of Washington
Country United States 
Sector Academic/University 
PI Contribution Researchers in my team are contributing expertise in proteome Mendelian randomization and genetic colocalization
Collaborator Contribution Our partners are contributing expertise in proteomics and cardivascular disease and relevant datasets
Impact N/A
Start Year 2020
 
Description CVD-COVID-UK 
Organisation Health Data Research UK
Country United Kingdom 
Sector Private 
PI Contribution Analyses on the potential role of drug targets in COVID-19
Collaborator Contribution This is a HDR-UK consortium with wide contributions from partners in terms of data, expertise, analyses and technologies.
Impact N/A
Start Year 2020
 
Description Genetics of DNA Methylation Consortium 
Organisation CeMM Research Center for Molecular Medicine
Country Austria 
Sector Academic/University 
PI Contribution Contributing to a consortial analysis of methylation quantitative trait loci using data from the Accessible Resource for Integrated Epigenomics Studies (ARIES)
Collaborator Contribution Contributing to a consortial analysis of methylation quantitative trait loci using data from other studies.
Impact Multi-disciplinary collaboration involving molecular epidemiology, statistics and bioinformatics. Outputs: Database of methylation QTL: http://mqtldb.godmc.org.uk/ Publication pending
Start Year 2013
 
Description Genetics of DNA Methylation Consortium 
Organisation King's College London
Department Brain Bank
Country United Kingdom 
Sector Academic/University 
PI Contribution Contributing to a consortial analysis of methylation quantitative trait loci using data from the Accessible Resource for Integrated Epigenomics Studies (ARIES)
Collaborator Contribution Contributing to a consortial analysis of methylation quantitative trait loci using data from other studies.
Impact Multi-disciplinary collaboration involving molecular epidemiology, statistics and bioinformatics. Outputs: Database of methylation QTL: http://mqtldb.godmc.org.uk/ Publication pending
Start Year 2013
 
Description Genetics of DNA Methylation Consortium 
Organisation Leiden University Medical Center
Country Netherlands 
Sector Academic/University 
PI Contribution Contributing to a consortial analysis of methylation quantitative trait loci using data from the Accessible Resource for Integrated Epigenomics Studies (ARIES)
Collaborator Contribution Contributing to a consortial analysis of methylation quantitative trait loci using data from other studies.
Impact Multi-disciplinary collaboration involving molecular epidemiology, statistics and bioinformatics. Outputs: Database of methylation QTL: http://mqtldb.godmc.org.uk/ Publication pending
Start Year 2013
 
Description Genetics of DNA Methylation Consortium 
Organisation Newcastle University
Country United Kingdom 
Sector Academic/University 
PI Contribution Contributing to a consortial analysis of methylation quantitative trait loci using data from the Accessible Resource for Integrated Epigenomics Studies (ARIES)
Collaborator Contribution Contributing to a consortial analysis of methylation quantitative trait loci using data from other studies.
Impact Multi-disciplinary collaboration involving molecular epidemiology, statistics and bioinformatics. Outputs: Database of methylation QTL: http://mqtldb.godmc.org.uk/ Publication pending
Start Year 2013
 
Description Genetics of DNA Methylation Consortium 
Organisation University of Bristol
Country United Kingdom 
Sector Academic/University 
PI Contribution Contributing to a consortial analysis of methylation quantitative trait loci using data from the Accessible Resource for Integrated Epigenomics Studies (ARIES)
Collaborator Contribution Contributing to a consortial analysis of methylation quantitative trait loci using data from other studies.
Impact Multi-disciplinary collaboration involving molecular epidemiology, statistics and bioinformatics. Outputs: Database of methylation QTL: http://mqtldb.godmc.org.uk/ Publication pending
Start Year 2013
 
Description Genetics of DNA Methylation Consortium 
Organisation University of Exeter
Country United Kingdom 
Sector Academic/University 
PI Contribution Contributing to a consortial analysis of methylation quantitative trait loci using data from the Accessible Resource for Integrated Epigenomics Studies (ARIES)
Collaborator Contribution Contributing to a consortial analysis of methylation quantitative trait loci using data from other studies.
Impact Multi-disciplinary collaboration involving molecular epidemiology, statistics and bioinformatics. Outputs: Database of methylation QTL: http://mqtldb.godmc.org.uk/ Publication pending
Start Year 2013
 
Description IEU/HUNT collaboration 
Organisation Norwegian University of Science and Technology (NTNU)
Country Norway 
Sector Academic/University 
PI Contribution Mendelian randomization, genetic and molecular epidemiology applied to UK Biobank
Collaborator Contribution Mendelian randomization, genetic and molecular epidemiology applied to the HUNT study
Impact N/A
Start Year 2019
 
Description MR-Base collaboration 
Organisation Biogen
Country United Kingdom 
Sector Private 
PI Contribution We are collaborating with GlaxoSmithKline and Biogen on the further development and enhancement of the MR-Base platform, with a particular focus on the evaluation of potential drug targets.
Collaborator Contribution The industry partners are providing scientific input on the project and advising on who to maximise translational value of the MR-Base platform.
Impact Outputs/outcomes: * expansion of the database underlying MR-Base. Papers: * Baird DA, Liu JZ, Zheng J, Sieberts SK, Perumal T, Elsworth B, Richardson TG... AMP-AD eQTL working group . (2021). Identifying drug targets for neurological and psychiatric disease via genetics and the brain transcriptome.. PLoS genetics, 17 (1), pp. e1009224 * Zheng J, Haberland V, Baird D, Walker V, Haycock PC, Hurle MR, Gutteridge A... Gaunt TR. (2020). Phenome-wide Mendelian randomization mapping the influence of the plasma proteome on complex diseases.. Nature genetics, 52 (10), pp. 1122-1131
Start Year 2017
 
Description MR-Base collaboration 
Organisation GlaxoSmithKline (GSK)
Country Global 
Sector Private 
PI Contribution We are collaborating with GlaxoSmithKline and Biogen on the further development and enhancement of the MR-Base platform, with a particular focus on the evaluation of potential drug targets.
Collaborator Contribution The industry partners are providing scientific input on the project and advising on who to maximise translational value of the MR-Base platform.
Impact Outputs/outcomes: * expansion of the database underlying MR-Base. Papers: * Baird DA, Liu JZ, Zheng J, Sieberts SK, Perumal T, Elsworth B, Richardson TG... AMP-AD eQTL working group . (2021). Identifying drug targets for neurological and psychiatric disease via genetics and the brain transcriptome.. PLoS genetics, 17 (1), pp. e1009224 * Zheng J, Haberland V, Baird D, Walker V, Haycock PC, Hurle MR, Gutteridge A... Gaunt TR. (2020). Phenome-wide Mendelian randomization mapping the influence of the plasma proteome on complex diseases.. Nature genetics, 52 (10), pp. 1122-1131
Start Year 2017
 
Description Oracle MR-Base collaboration 
Organisation Oracle Corporation
Department Oracle Corporation UK Ltd
Country United Kingdom 
Sector Private 
PI Contribution We implemented an ElasticSearch database in Oracle Cloud using credits provided by Oracle. We then transferred data from the IEU GWAS database into this system and connected it to the IEU OpenGWAS database (https://gwas.mrcieu.ac.uk) for use by the wider research community.
Collaborator Contribution Oracle provided free credits and support with configuration and optimisation of a virtual cluster to support our ElasticSearch database.
Impact IEU GWAS database: https://gwas.mrcieu.ac.uk
Start Year 2018
 
Title EpiGraphDB 
Description EpiGraphDB is an analytical platform and database to support data mining in epidemiology. The platform incorporates a graph of causal estimates generated by systematically applying Mendelian randomization to a wide array of phenotypes, and augments this with a wealth of additional data from other bioinformatic sources. EpiGraphDB aims to support appropriate application and interpretation of causal inference in systematic automated analyses of many phenotypes. There is also an epigraphdb R package to provide ease of access to EpiGraphDB services. We will refer to epigraphdb as the name of the R package whereas "EpiGraphDB" as the overall platform. 
Type Of Technology Webtool/Application 
Year Produced 2020 
Open Source License? Yes  
Impact The database includes data from our systematic proteome-wide analysis of potential drug targets (published in Nature Genetics, 2020), which has been widely accessed by researchers from around the world. 
URL https://epigraphdb.org/
 
Title MELODI Presto 
Description The field of literature based discovery is growing in step with the volume of literature being produced. From modern natural language processing algorithms to high quality entity tagging, the methods and their impact are developing rapidly. One annotation object that arises from these approaches, the subject-predicate-object triple, is proving to be very useful in representing knowledge. We have implemented efficient search methods and an application programming interface (API), to create fast and convenient functions to utilize triples extracted from the biomedical literature by SemMedDB. By refining these data we have identified a set of triples that focus on the mechanistic aspects of the literature, and provide simple methods to explore both enriched triples from single queries, and overlapping triples across two query lists. 
Type Of Technology Webtool/Application 
Year Produced 2020 
Open Source License? Yes  
Impact N/A 
URL https://melodi-presto.mrcieu.ac.uk/
 
Title MR-Base 
Description MR-base is a web application and R package providing a range of different methods for two-sample Mendelian randomization, and designed to be used with the IEU GWAS database 
Type Of Technology Webtool/Application 
Year Produced 2016 
Impact MR-base is being widely used by researchers to perform two-sample MR 
URL http://www.mrbase.org/
 
Title MR-Base PheWAS tool 
Description The MR-Base PheWAS tool allows users to rapidly search the associations of a SNP across all phenotypes represented in the IEU GWAS database (part of the MR-Base platform). 
Type Of Technology Webtool/Application 
Year Produced 2018 
Impact This is used by researchers as a rapid way of reviewing the associations for a single genetic variant using one of the largest public GWAS databases available. 
URL http://phewas.mrbase.org/
 
Title MendelVar 
Description MendelVar provides a quick overview of possible impact of Mendelian disease-related genes on user's complex phenotype of interest. It returns the details of all known broadly defined Mendelian diseases and their causal genes found in the custom genomic intervals as well as overlapping pathogenic rare mutations responsible for Mendelian disease. Enrichment of Disease Ontology, Human Phenotype Ontology terms among the Mendelian genes gives the researcher an overview of any shared features with their trait of interest, e.g. in terms of anatomy. 
Type Of Technology Webtool/Application 
Year Produced 2020 
Open Source License? Yes  
Impact Openly accessible to the research community 
URL https://mendelvar.mrcieu.ac.uk/
 
Title Vectology - exploring biomedical variable relationships using sentence embedding and vectors 
Description Many biomedical data sets contain variables that are identified by simple, and often short, descriptions. Traditionally these would either be manually annotated and/or assigned to ontologies using expert knowledge, facilitating interactions with other data sets and gaining an understanding of where these variables lie in the biomedical knowledge space. With Vectology we utilise sentence embedding methods and convert these variables into vectors, calculated from precomputed models derived from biomedical literature to infer relationships between variables. 
Type Of Technology Webtool/Application 
Year Produced 2019 
Impact The approach has been utilised in the IEU GWAS database to support identification of related datasets. 
URL http://vectology.mrcieu.ac.uk/
 
Title epigraphdb-r: An R package to use EpiGraphDB 
Description This is an R package designed to access data from EpiGraphDB (using the EpiGraphDB API) to support further analysis. 
Type Of Technology Software 
Year Produced 2019 
Open Source License? Yes  
Impact Wider accessibility to EpiGraphDB 
URL http://www.epigraphdb.org/
 
Title gwas2vcf 
Description Tool to map GWAS summary statistics to VCF/BCF with on-the-fly harmonisation to a supplied reference FASTA 
Type Of Technology Software 
Year Produced 2020 
Open Source License? Yes  
Impact This has now been adopted by the IEU OpenGWAS project for submission of GWAS summary statistics to the database. 
URL https://github.com/MRCIEU/gwas2vcf
 
Title ieugwaspy 
Description The IEU GWAS database comprises over 10,000 curated, QC'd and harmonised complete GWAS summary datasets and can be queried using an API. See here for documentation on the API itself. This Python package package is a wrapper to make generic calls to the API, plus convenience functions for specific queries. 
Type Of Technology Software 
Year Produced 2020 
Open Source License? Yes  
Impact N/A 
URL https://github.com/MRCIEU/ieugwaspy
 
Title pygwasvcf 
Description pygwasvcf provides a wrapper around pysam and rsidx to parse and query VCF files containing GWAS summary statistics and trait metadata. See also gwasvcf an R package for parsing GWAS-VCF files. 
Type Of Technology Software 
Year Produced 2020 
Open Source License? Yes  
Impact This tool has been adopted by the IEU OpenGWAS database to promote the use of a standard GWAS VCF format with datasets downloaded from the database 
URL https://github.com/MRCIEU/pygwasvcf
 
Title varGWAS 
Description Software to perform genome-wide association study of SNP effects on trait variance 
Type Of Technology Webtool/Application 
Year Produced 2021 
Open Source License? Yes  
Impact This has been used to generate variance QTL on biomarkers in UK Biobank 
 
Description A seminar for The Seventh Affiliated Hospital, Sun Yat-sen University 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact The seminar is aiming to introduce genetics concepts, data and methods to clinical people in the hospital. We promote the methods and database we built up in Bristol during the seminar.

We also tried to setup collaboration after the seminar.
Year(s) Of Engagement Activity 2020
 
Description Elastic Community Conference: Improving the accessibility of 100 billion genetic associations 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Industry/Business
Results and Impact Presentation on the implementation of the IEU OpenGWAS database on ElasticSearch in Oracle Cloud (https://gwas.mrcieu.ac.uk).
Year(s) Of Engagement Activity 2021
URL https://youtu.be/Okvad9D4kT0
 
Description Genetic study of proteins is a breakthrough in drug development for complex diseases 
Form Of Engagement Activity A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Press release on an innovative genetic study of blood protein level in collaboration with pharmaceutical partners, showcasing a key Nature Genetics paper which demonstrated how genetic data can be used to support drug target prioritisation by identifying the causal effects of proteins on diseases.
Year(s) Of Engagement Activity 2020
URL https://www.bristol.ac.uk/news/2020/september/genetic-study-of-proteins.html
 
Description Presentation at ASHG in San Diego - D Baird 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Dr Denis Baird was invited to give a presentation at the annual genetics conference for the American Society of Human Genetics to communicate main findings from research into identifying the genes underlying neurological/psychiatric conditions. The presentation was entitled: Identifying the tissue-specific influence of gene expression on neurological and psychiatric traits: a Mendelian Randomization study on gene expression within the human brain.
Year(s) Of Engagement Activity 2018
 
Description Presentation: "Creating, indexing and hosting 250 billion genetic associations with Elastic" at Elastic Meetup 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Professional Practitioners
Results and Impact One of our researchers gave a presentation on our innovative use of ElasticSearch for the IEU GWAS database (https://gwas.mrcieu.ac.uk) to a Regional Elastic Meetup.
Year(s) Of Engagement Activity 2020
URL https://www.meetup.com/South-West-Elastic-Fantastics/events/265525501/