📣 Help Shape the Future of UKRI's Gateway to Research (GtR)

We're improving UKRI's Gateway to Research and are seeking your input! If you would be interested in being interviewed about the improvements we're making and to have your say about how we can make GtR more user-friendly, impactful, and effective for the Research and Innovation community, please email gateway@ukri.org.

Data mining epidemiological relationships

Lead Research Organisation: University of Bristol
Department Name: UNLISTED

Abstract

We aim to develop and use cutting edge data mining tools to identify risk factors that cause common diseases and potential drug targets that could prevent or treat these diseases. Methods developed within the MRC Integrative Epidemiology Unit use genetic data to help identify lifestyle risk factors that could be modified to reduce the risk or impact of disease, and can also identify potential drug targets. This programme is developing tools and databases to automate this type of analysis and apply it to large-scale population datasets to help us discover new ways to prevent and treat disease. We are also combining the evidence from these analyses with other types of biomedical information in a “knowledge graph” to enable us to investigate the mechanisms underlying disease, identify new targets for treatment or prevention, predict side effects of drugs and identify opportunities to repurpose existing drugs for other diseases. The methods, software and knowledge graph we are developing are made openly available to the research community to maximise their potential to improve population health.

Technical Summary

Background: Mendelian randomization (MR) is typically used to address specific causal hypotheses. Our MR-Base platform and OpenGWAS database now enable more systematic MR analyses of causal relationships between many traits and diseases, whilst our EpiGraphDB knowledge graph integrates these results with other biomedical evidence. Despite successes in identifying intervention targets and repurposing opportunities, such systematic MR analyses still face unsolved challenges in their interpretation and integration with other knowledge.
Aims: We aim to further advance approaches for systematically generating and integrating evidence to identify and prioritize intervention targets for disease prevention and treatment, and make these approaches and data resources widely accessible.
Objectives: (1) Developing and applying knowledge graphs (KGs) to generate hypotheses: we will use EpiGraphDB (and other KGs) for systematic analysis of specific disease outcomes, explore the use of graph embedding/link prediction methods to improve KGs and identify novel hypotheses, and develop natural language KG query interfaces to broaden their applicability. (2) Automating triangulation and evidence synthesis: we will develop new approaches to extracting evidence from the literature, websites and clinical trials databases. We will then systematically integrate this with evidence from MR and observational studies (including target trial emulation) and explore approaches to automating triangulation and synthesis of evidence for intervention targets. (3) Identifying and prioritizing intervention targets: we will use transcriptomic signatures to identify off-target side effects, strengthen the evidence for drug targets by integrating molecular QTL (molQTL) across traits and tissues with literature, coding mutations (including autozygous loss of function mutations) and animal knockouts, and implement approaches for identifying interactions. We will further develop trans-ancestry MR for prediction of cross-population generalisability of both pharmaceutical and non-pharmaceutical interventions. (4) New software and data resources: we will develop new open data and software resources based on IEU methodological innovations. We will enhance OpenGWAS by integrating non-European GWAS datasets to support multi-ancestry MR, implementing variance GWAS to identify potential interactions and improve automated phenotype curation and clustering. We will also implement a new curated molQTL catalogue to support drug-target MR.
Importance: This programme will develop and apply systematic approaches to prioritise and validate causal hypotheses, linking methodology developed in the unit with applied epidemiological research. Implementing these approaches in open software/data resources and applying them to emerging datasets will yield new discoveries to improve population health.

Publications

10 25 50
publication icon
Barry C (2023) How to estimate heritability: a guide for genetic epidemiologists in International Journal of Epidemiology

publication icon
Hazelwood E (2024) Plasma Ghrelin and Risks of Sex-Specific, Site-Specific, and Early-Onset Colorectal Cancer: A Mendelian Randomization Analysis. in Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology

publication icon
Liu Y (2024) Triangulating evidence in health sciences with Annotated Semantic Queries. in Bioinformatics (Oxford, England)

 
Description CHECKPOINT
Amount £3,499,252 (GBP)
Organisation Medical Research Council (MRC) 
Sector Public
Country United Kingdom
Start 03/2024 
End 02/2029
 
Description Causal inference methods to integrate genetics and multi-omics data for target discovery and validation
Amount $493,127 (USD)
Organisation Biogen Idec 
Sector Private
Country United States
Start 11/2023 
End 11/2027
 
Description Skin Genetics Consortium grant
Amount 4,046,238 kr. (DKK)
Organisation LEO Foundation 
Sector Charity/Non Profit
Country Denmark
Start 03/2025 
End 09/2027
 
Title DrivR-Base 
Description DrivR-Base is a pipeline for extracting feature information from different databases for single nucleotide variants (SNVs). These features are designed to be inputs for machine learning models, aiding in the prediction of functional impacts of genetic variants in human genome sequencing. 
Type Of Material Computer model/algorithm 
Year Produced 2023 
Provided To Others? Yes  
Impact This is forming the basis of ongoing work for variant effect prediction (in preparation for publication) 
URL https://github.com/amyfrancis97/DrivR-Base
 
Description CUP-Global 
Organisation Imperial College London
Country United Kingdom 
Sector Academic/University 
PI Contribution We are collaborating with the Global Cancer Update Programme (CUP-Global) team on processes to automate the processes of systematic review used in the CUP project.
Collaborator Contribution The CUP-Global team are providing information on the challenges of information extraction from the literature, and human-curated training datasets.
Impact None yet
Start Year 2023
 
Description CVD-COVID-UK 
Organisation Health Data Research UK
Country United Kingdom 
Sector Charity/Non Profit 
PI Contribution Analyses on the potential role of drug targets in COVID-19
Collaborator Contribution This is a HDR-UK consortium with wide contributions from partners in terms of data, expertise, analyses and technologies.
Impact N/A
Start Year 2020
 
Description Genetics of DNA Methylation Consortium 
Organisation CeMM Research Center for Molecular Medicine
Country Austria 
Sector Academic/University 
PI Contribution Contributing to a consortial analysis of methylation quantitative trait loci using data from the Accessible Resource for Integrated Epigenomics Studies (ARIES)
Collaborator Contribution Contributing to a consortial analysis of methylation quantitative trait loci using data from other studies.
Impact Multi-disciplinary collaboration involving molecular epidemiology, statistics and bioinformatics. Outputs: Database of methylation QTL: http://mqtldb.godmc.org.uk/ Publication pending
Start Year 2013
 
Description Genetics of DNA Methylation Consortium 
Organisation King's College London
Department Brain Bank
Country United Kingdom 
Sector Academic/University 
PI Contribution Contributing to a consortial analysis of methylation quantitative trait loci using data from the Accessible Resource for Integrated Epigenomics Studies (ARIES)
Collaborator Contribution Contributing to a consortial analysis of methylation quantitative trait loci using data from other studies.
Impact Multi-disciplinary collaboration involving molecular epidemiology, statistics and bioinformatics. Outputs: Database of methylation QTL: http://mqtldb.godmc.org.uk/ Publication pending
Start Year 2013
 
Description Genetics of DNA Methylation Consortium 
Organisation Leiden University Medical Center
Country Netherlands 
Sector Academic/University 
PI Contribution Contributing to a consortial analysis of methylation quantitative trait loci using data from the Accessible Resource for Integrated Epigenomics Studies (ARIES)
Collaborator Contribution Contributing to a consortial analysis of methylation quantitative trait loci using data from other studies.
Impact Multi-disciplinary collaboration involving molecular epidemiology, statistics and bioinformatics. Outputs: Database of methylation QTL: http://mqtldb.godmc.org.uk/ Publication pending
Start Year 2013
 
Description Genetics of DNA Methylation Consortium 
Organisation Newcastle University
Country United Kingdom 
PI Contribution Contributing to a consortial analysis of methylation quantitative trait loci using data from the Accessible Resource for Integrated Epigenomics Studies (ARIES)
Collaborator Contribution Contributing to a consortial analysis of methylation quantitative trait loci using data from other studies.
Impact Multi-disciplinary collaboration involving molecular epidemiology, statistics and bioinformatics. Outputs: Database of methylation QTL: http://mqtldb.godmc.org.uk/ Publication pending
Start Year 2013
 
Description Genetics of DNA Methylation Consortium 
Organisation University of Bristol
Country United Kingdom 
Sector Academic/University 
PI Contribution Contributing to a consortial analysis of methylation quantitative trait loci using data from the Accessible Resource for Integrated Epigenomics Studies (ARIES)
Collaborator Contribution Contributing to a consortial analysis of methylation quantitative trait loci using data from other studies.
Impact Multi-disciplinary collaboration involving molecular epidemiology, statistics and bioinformatics. Outputs: Database of methylation QTL: http://mqtldb.godmc.org.uk/ Publication pending
Start Year 2013
 
Description Genetics of DNA Methylation Consortium 
Organisation University of Exeter
Country United Kingdom 
Sector Academic/University 
PI Contribution Contributing to a consortial analysis of methylation quantitative trait loci using data from the Accessible Resource for Integrated Epigenomics Studies (ARIES)
Collaborator Contribution Contributing to a consortial analysis of methylation quantitative trait loci using data from other studies.
Impact Multi-disciplinary collaboration involving molecular epidemiology, statistics and bioinformatics. Outputs: Database of methylation QTL: http://mqtldb.godmc.org.uk/ Publication pending
Start Year 2013
 
Description IEU/UPenn collaboration 
Organisation University of Pennsylvania
Country United States 
Sector Academic/University 
PI Contribution Mendelian randomization projects: conception, design, analysis and interpretation
Collaborator Contribution Mendelian randomization projects: conception, design, data and compute resources and interpretation
Impact Multi-disciplinary, integrating clinical, epidemiological and informatics expertise. Outputs: doi: 10.1007/s00125-022-05653-1
Start Year 2019
 
Title ASQ 
Description The EpiGraphDB-ASQ (ASQ; /??sk/ i.e. "ask") interface is a natural language interface to query the integrated epidemiological evidence of the EpiGraphDB data and ecosystem. The starting point of the query is either a short paragraph of text from which ASQ will derive and extract claim triples, or users can supply those claim triples directly. ASQ will retrieve data from EpiGraphDB, both biomedical entities and evidence from various sources, to faciliate the triangulation of the evidence regarding a specific claim. 
Type Of Technology Webtool/Application 
Year Produced 2022 
Open Source License? Yes  
Impact Publication pre-printed and in submission 
URL https://asq.epigraphdb.org/
 
Title CanDrivR-CS 
Description CanDrivR-CS is a cancer-specific machine learning framework for distinguishing recurrent and rare variants 
Type Of Technology Software 
Year Produced 2024 
Open Source License? Yes  
Impact Pre-print published, and in submission for journal publication. They key finding is that cancer-specific predictors of somatic driver mutations perform better than pan-cancer predictors. This is likely to be important to drug discovery research. 
URL https://github.com/amyfrancis97/CanDrivR-CS
 
Description Organising Uganda Hub for Mendelian Randomization Conference 2024 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact We organised an international hub at the MRC/UVRI and LSHTM Uganda Research Unit for participants from Africa to remotely join the international Mendelian Randomization conference hosted in Bristol 19-21 June 2024. This Hub aimed to promote both inclusivity in the global research community and environmental sustainability. In addition, we hope that the success of this Hub will provide the infrastructure support to include more international hubs for future events, allowing for greater opportunities to connect on a global platform.
Year(s) Of Engagement Activity 2024
URL https://www.mendelianrandomization.org.uk/uganda-conference-hub/
 
Description Patient and Public Involvement Workshops 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Patients, carers and/or patient groups
Results and Impact Patient and Public Involvement and Engagement (PPIE) Workshop to inform the development of a Cancer Research UK grant application
Year(s) Of Engagement Activity 2024