Data mining epidemiological relationships

Lead Research Organisation: University of Bristol
Department Name: UNLISTED

Abstract

We aim to develop and use cutting edge data mining tools to identify risk factors that cause common diseases and potential drug targets that could prevent or treat these diseases. Methods developed within the MRC Integrative Epidemiology Unit use genetic data to help identify lifestyle risk factors that could be modified to reduce the risk or impact of disease, and can also identify potential drug targets. This programme is developing tools and databases to automate this type of analysis and apply it to large-scale population datasets to help us discover new ways to prevent and treat disease. We are also combining the evidence from these analyses with other types of biomedical information in a “knowledge graph” to enable us to investigate the mechanisms underlying disease, identify new targets for treatment or prevention, predict side effects of drugs and identify opportunities to repurpose existing drugs for other diseases. The methods, software and knowledge graph we are developing are made openly available to the research community to maximise their potential to improve population health.

Technical Summary

Background: Mendelian randomization (MR) is typically used to address specific causal hypotheses. Our MR-Base platform and OpenGWAS database now enable more systematic MR analyses of causal relationships between many traits and diseases, whilst our EpiGraphDB knowledge graph integrates these results with other biomedical evidence. Despite successes in identifying intervention targets and repurposing opportunities, such systematic MR analyses still face unsolved challenges in their interpretation and integration with other knowledge.
Aims: We aim to further advance approaches for systematically generating and integrating evidence to identify and prioritize intervention targets for disease prevention and treatment, and make these approaches and data resources widely accessible.
Objectives: (1) Developing and applying knowledge graphs (KGs) to generate hypotheses: we will use EpiGraphDB (and other KGs) for systematic analysis of specific disease outcomes, explore the use of graph embedding/link prediction methods to improve KGs and identify novel hypotheses, and develop natural language KG query interfaces to broaden their applicability. (2) Automating triangulation and evidence synthesis: we will develop new approaches to extracting evidence from the literature, websites and clinical trials databases. We will then systematically integrate this with evidence from MR and observational studies (including target trial emulation) and explore approaches to automating triangulation and synthesis of evidence for intervention targets. (3) Identifying and prioritizing intervention targets: we will use transcriptomic signatures to identify off-target side effects, strengthen the evidence for drug targets by integrating molecular QTL (molQTL) across traits and tissues with literature, coding mutations (including autozygous loss of function mutations) and animal knockouts, and implement approaches for identifying interactions. We will further develop trans-ancestry MR for prediction of cross-population generalisability of both pharmaceutical and non-pharmaceutical interventions. (4) New software and data resources: we will develop new open data and software resources based on IEU methodological innovations. We will enhance OpenGWAS by integrating non-European GWAS datasets to support multi-ancestry MR, implementing variance GWAS to identify potential interactions and improve automated phenotype curation and clustering. We will also implement a new curated molQTL catalogue to support drug-target MR.
Importance: This programme will develop and apply systematic approaches to prioritise and validate causal hypotheses, linking methodology developed in the unit with applied epidemiological research. Implementing these approaches in open software/data resources and applying them to emerging datasets will yield new discoveries to improve population health.

Related Projects

Project Reference Relationship Related To Start End Award Value
MC_UU_00032/1 31/03/2023 30/03/2028 £3,355,000
MC_UU_00032/2 Transfer MC_UU_00032/1 31/03/2023 30/03/2028 £1,554,000
MC_UU_00032/3 Transfer MC_UU_00032/2 31/03/2023 30/03/2028 £1,559,000
MC_UU_00032/4 Transfer MC_UU_00032/3 31/03/2023 30/03/2028 £171,000
MC_UU_00032/5 Transfer MC_UU_00032/4 31/03/2023 30/03/2028 £1,826,000
MC_UU_00032/6 Transfer MC_UU_00032/5 31/03/2023 30/03/2028 £1,552,000
MC_UU_00032/7 Transfer MC_UU_00032/6 31/03/2023 30/03/2028 £772,000
 
Description CHECKPOINT
Amount £3,499,252 (GBP)
Organisation Medical Research Council (MRC) 
Sector Public
Country United Kingdom
Start 03/2024 
End 02/2029
 
Description Causal inference methods to integrate genetics and multi-omics data for target discovery and validation
Amount $493,127 (USD)
Organisation Biogen Idec 
Sector Private
Country United States
Start 11/2023 
End 11/2027
 
Title DrivR-Base 
Description DrivR-Base is a pipeline for extracting feature information from different databases for single nucleotide variants (SNVs). These features are designed to be inputs for machine learning models, aiding in the prediction of functional impacts of genetic variants in human genome sequencing. 
Type Of Material Computer model/algorithm 
Year Produced 2023 
Provided To Others? Yes  
Impact This is forming the basis of ongoing work for variant effect prediction (in preparation for publication) 
URL https://github.com/amyfrancis97/DrivR-Base
 
Description Biogen collaboration 
Organisation Biogen Idec
Country United States 
Sector Private 
PI Contribution We are continuing a previous collaboration with Biogen focused on drug target prioritization using Mendelian randomization and genetic colocalization with molecular QTL datasets.
Collaborator Contribution Biogen are providing funding, pharmaceutical expertise and datasets relevant to their target areas.
Impact N/A
Start Year 2021
 
Description CUP-Global 
Organisation Imperial College London
Country United Kingdom 
Sector Academic/University 
PI Contribution We are collaborating with the Global Cancer Update Programme (CUP-Global) team on processes to automate the processes of systematic review used in the CUP project.
Collaborator Contribution The CUP-Global team are providing information on the challenges of information extraction from the literature, and human-curated training datasets.
Impact None yet
Start Year 2023
 
Description CVD-COVID-UK 
Organisation Health Data Research UK
Country United Kingdom 
Sector Private 
PI Contribution Analyses on the potential role of drug targets in COVID-19
Collaborator Contribution This is a HDR-UK consortium with wide contributions from partners in terms of data, expertise, analyses and technologies.
Impact N/A
Start Year 2020
 
Description Genetics of DNA Methylation Consortium 
Organisation CeMM Research Center for Molecular Medicine
Country Austria 
Sector Academic/University 
PI Contribution Contributing to a consortial analysis of methylation quantitative trait loci using data from the Accessible Resource for Integrated Epigenomics Studies (ARIES)
Collaborator Contribution Contributing to a consortial analysis of methylation quantitative trait loci using data from other studies.
Impact Multi-disciplinary collaboration involving molecular epidemiology, statistics and bioinformatics. Outputs: Database of methylation QTL: http://mqtldb.godmc.org.uk/ Publication pending
Start Year 2013
 
Description Genetics of DNA Methylation Consortium 
Organisation King's College London
Department Brain Bank
Country United Kingdom 
Sector Academic/University 
PI Contribution Contributing to a consortial analysis of methylation quantitative trait loci using data from the Accessible Resource for Integrated Epigenomics Studies (ARIES)
Collaborator Contribution Contributing to a consortial analysis of methylation quantitative trait loci using data from other studies.
Impact Multi-disciplinary collaboration involving molecular epidemiology, statistics and bioinformatics. Outputs: Database of methylation QTL: http://mqtldb.godmc.org.uk/ Publication pending
Start Year 2013
 
Description Genetics of DNA Methylation Consortium 
Organisation Leiden University Medical Center
Country Netherlands 
Sector Academic/University 
PI Contribution Contributing to a consortial analysis of methylation quantitative trait loci using data from the Accessible Resource for Integrated Epigenomics Studies (ARIES)
Collaborator Contribution Contributing to a consortial analysis of methylation quantitative trait loci using data from other studies.
Impact Multi-disciplinary collaboration involving molecular epidemiology, statistics and bioinformatics. Outputs: Database of methylation QTL: http://mqtldb.godmc.org.uk/ Publication pending
Start Year 2013
 
Description Genetics of DNA Methylation Consortium 
Organisation Newcastle University
Country United Kingdom 
Sector Academic/University 
PI Contribution Contributing to a consortial analysis of methylation quantitative trait loci using data from the Accessible Resource for Integrated Epigenomics Studies (ARIES)
Collaborator Contribution Contributing to a consortial analysis of methylation quantitative trait loci using data from other studies.
Impact Multi-disciplinary collaboration involving molecular epidemiology, statistics and bioinformatics. Outputs: Database of methylation QTL: http://mqtldb.godmc.org.uk/ Publication pending
Start Year 2013
 
Description Genetics of DNA Methylation Consortium 
Organisation University of Bristol
Country United Kingdom 
Sector Academic/University 
PI Contribution Contributing to a consortial analysis of methylation quantitative trait loci using data from the Accessible Resource for Integrated Epigenomics Studies (ARIES)
Collaborator Contribution Contributing to a consortial analysis of methylation quantitative trait loci using data from other studies.
Impact Multi-disciplinary collaboration involving molecular epidemiology, statistics and bioinformatics. Outputs: Database of methylation QTL: http://mqtldb.godmc.org.uk/ Publication pending
Start Year 2013
 
Description Genetics of DNA Methylation Consortium 
Organisation University of Exeter
Country United Kingdom 
Sector Academic/University 
PI Contribution Contributing to a consortial analysis of methylation quantitative trait loci using data from the Accessible Resource for Integrated Epigenomics Studies (ARIES)
Collaborator Contribution Contributing to a consortial analysis of methylation quantitative trait loci using data from other studies.
Impact Multi-disciplinary collaboration involving molecular epidemiology, statistics and bioinformatics. Outputs: Database of methylation QTL: http://mqtldb.godmc.org.uk/ Publication pending
Start Year 2013
 
Description IEU/UPenn collaboration 
Organisation University of Pennsylvania
Country United States 
Sector Academic/University 
PI Contribution Mendelian randomization projects: conception, design, analysis and interpretation
Collaborator Contribution Mendelian randomization projects: conception, design, data and compute resources and interpretation
Impact Multi-disciplinary, integrating clinical, epidemiological and informatics expertise. Outputs: doi: 10.1007/s00125-022-05653-1
Start Year 2019
 
Title ASQ 
Description The EpiGraphDB-ASQ (ASQ; /??sk/ i.e. "ask") interface is a natural language interface to query the integrated epidemiological evidence of the EpiGraphDB data and ecosystem. The starting point of the query is either a short paragraph of text from which ASQ will derive and extract claim triples, or users can supply those claim triples directly. ASQ will retrieve data from EpiGraphDB, both biomedical entities and evidence from various sources, to faciliate the triangulation of the evidence regarding a specific claim. 
Type Of Technology Webtool/Application 
Year Produced 2022 
Open Source License? Yes  
Impact Publication pre-printed and in submission 
URL https://asq.epigraphdb.org/