Data mining epidemiological relationships
Lead Research Organisation:
University of Bristol
Department Name: UNLISTED
Abstract
We aim to develop and use cutting edge data mining tools to identify risk factors that cause common diseases and potential drug targets that could prevent or treat these diseases. Methods developed within the MRC Integrative Epidemiology Unit use genetic data to help identify lifestyle risk factors that could be modified to reduce the risk or impact of disease, and can also identify potential drug targets. This programme is developing tools and databases to automate this type of analysis and apply it to large-scale population datasets to help us discover new ways to prevent and treat disease. We are also combining the evidence from these analyses with other types of biomedical information in a “knowledge graph” to enable us to investigate the mechanisms underlying disease, identify new targets for treatment or prevention, predict side effects of drugs and identify opportunities to repurpose existing drugs for other diseases. The methods, software and knowledge graph we are developing are made openly available to the research community to maximise their potential to improve population health.
Technical Summary
Background: Mendelian randomization (MR) is typically used to address specific causal hypotheses. Our MR-Base platform and OpenGWAS database now enable more systematic MR analyses of causal relationships between many traits and diseases, whilst our EpiGraphDB knowledge graph integrates these results with other biomedical evidence. Despite successes in identifying intervention targets and repurposing opportunities, such systematic MR analyses still face unsolved challenges in their interpretation and integration with other knowledge.
Aims: We aim to further advance approaches for systematically generating and integrating evidence to identify and prioritize intervention targets for disease prevention and treatment, and make these approaches and data resources widely accessible.
Objectives: (1) Developing and applying knowledge graphs (KGs) to generate hypotheses: we will use EpiGraphDB (and other KGs) for systematic analysis of specific disease outcomes, explore the use of graph embedding/link prediction methods to improve KGs and identify novel hypotheses, and develop natural language KG query interfaces to broaden their applicability. (2) Automating triangulation and evidence synthesis: we will develop new approaches to extracting evidence from the literature, websites and clinical trials databases. We will then systematically integrate this with evidence from MR and observational studies (including target trial emulation) and explore approaches to automating triangulation and synthesis of evidence for intervention targets. (3) Identifying and prioritizing intervention targets: we will use transcriptomic signatures to identify off-target side effects, strengthen the evidence for drug targets by integrating molecular QTL (molQTL) across traits and tissues with literature, coding mutations (including autozygous loss of function mutations) and animal knockouts, and implement approaches for identifying interactions. We will further develop trans-ancestry MR for prediction of cross-population generalisability of both pharmaceutical and non-pharmaceutical interventions. (4) New software and data resources: we will develop new open data and software resources based on IEU methodological innovations. We will enhance OpenGWAS by integrating non-European GWAS datasets to support multi-ancestry MR, implementing variance GWAS to identify potential interactions and improve automated phenotype curation and clustering. We will also implement a new curated molQTL catalogue to support drug-target MR.
Importance: This programme will develop and apply systematic approaches to prioritise and validate causal hypotheses, linking methodology developed in the unit with applied epidemiological research. Implementing these approaches in open software/data resources and applying them to emerging datasets will yield new discoveries to improve population health.
Aims: We aim to further advance approaches for systematically generating and integrating evidence to identify and prioritize intervention targets for disease prevention and treatment, and make these approaches and data resources widely accessible.
Objectives: (1) Developing and applying knowledge graphs (KGs) to generate hypotheses: we will use EpiGraphDB (and other KGs) for systematic analysis of specific disease outcomes, explore the use of graph embedding/link prediction methods to improve KGs and identify novel hypotheses, and develop natural language KG query interfaces to broaden their applicability. (2) Automating triangulation and evidence synthesis: we will develop new approaches to extracting evidence from the literature, websites and clinical trials databases. We will then systematically integrate this with evidence from MR and observational studies (including target trial emulation) and explore approaches to automating triangulation and synthesis of evidence for intervention targets. (3) Identifying and prioritizing intervention targets: we will use transcriptomic signatures to identify off-target side effects, strengthen the evidence for drug targets by integrating molecular QTL (molQTL) across traits and tissues with literature, coding mutations (including autozygous loss of function mutations) and animal knockouts, and implement approaches for identifying interactions. We will further develop trans-ancestry MR for prediction of cross-population generalisability of both pharmaceutical and non-pharmaceutical interventions. (4) New software and data resources: we will develop new open data and software resources based on IEU methodological innovations. We will enhance OpenGWAS by integrating non-European GWAS datasets to support multi-ancestry MR, implementing variance GWAS to identify potential interactions and improve automated phenotype curation and clustering. We will also implement a new curated molQTL catalogue to support drug-target MR.
Importance: This programme will develop and apply systematic approaches to prioritise and validate causal hypotheses, linking methodology developed in the unit with applied epidemiological research. Implementing these approaches in open software/data resources and applying them to emerging datasets will yield new discoveries to improve population health.
Organisations
- University of Bristol (Lead Research Organisation)
- Leiden University Medical Center (Collaboration)
- University of Pennsylvania (Collaboration)
- Newcastle University (Collaboration)
- CeMM Research Center for Molecular Medicine (Collaboration)
- HEALTH DATA RESEARCH UK (Collaboration)
- Biogen Idec (Collaboration)
- IMPERIAL COLLEGE LONDON (Collaboration)
- KING'S COLLEGE LONDON (Collaboration)
- UNIVERSITY OF EXETER (Collaboration)
- University of Bristol (Collaboration)
Publications
Shakt G
(2024)
Major Depressive Disorder Impacts Peripheral Artery Disease Risk Through Intermediary Risk Factors.
in Journal of the American Heart Association
Urquijo H
(2023)
A lifecourse Mendelian randomization study uncovers age-dependent effects of adiposity on asthma risk.
in iScience
Related Projects
Project Reference | Relationship | Related To | Start | End | Award Value |
---|---|---|---|---|---|
MC_UU_00032/1 | 31/03/2023 | 30/03/2028 | £3,355,000 | ||
MC_UU_00032/2 | Transfer | MC_UU_00032/1 | 31/03/2023 | 30/03/2028 | £1,554,000 |
MC_UU_00032/3 | Transfer | MC_UU_00032/2 | 31/03/2023 | 30/03/2028 | £1,559,000 |
MC_UU_00032/4 | Transfer | MC_UU_00032/3 | 31/03/2023 | 30/03/2028 | £171,000 |
MC_UU_00032/5 | Transfer | MC_UU_00032/4 | 31/03/2023 | 30/03/2028 | £1,826,000 |
MC_UU_00032/6 | Transfer | MC_UU_00032/5 | 31/03/2023 | 30/03/2028 | £1,552,000 |
MC_UU_00032/7 | Transfer | MC_UU_00032/6 | 31/03/2023 | 30/03/2028 | £772,000 |
Description | CHECKPOINT |
Amount | £3,499,252 (GBP) |
Organisation | Medical Research Council (MRC) |
Sector | Public |
Country | United Kingdom |
Start | 03/2024 |
End | 02/2029 |
Description | Causal inference methods to integrate genetics and multi-omics data for target discovery and validation |
Amount | $493,127 (USD) |
Organisation | Biogen Idec |
Sector | Private |
Country | United States |
Start | 11/2023 |
End | 11/2027 |
Title | DrivR-Base |
Description | DrivR-Base is a pipeline for extracting feature information from different databases for single nucleotide variants (SNVs). These features are designed to be inputs for machine learning models, aiding in the prediction of functional impacts of genetic variants in human genome sequencing. |
Type Of Material | Computer model/algorithm |
Year Produced | 2023 |
Provided To Others? | Yes |
Impact | This is forming the basis of ongoing work for variant effect prediction (in preparation for publication) |
URL | https://github.com/amyfrancis97/DrivR-Base |
Description | Biogen collaboration |
Organisation | Biogen Idec |
Country | United States |
Sector | Private |
PI Contribution | We are continuing a previous collaboration with Biogen focused on drug target prioritization using Mendelian randomization and genetic colocalization with molecular QTL datasets. |
Collaborator Contribution | Biogen are providing funding, pharmaceutical expertise and datasets relevant to their target areas. |
Impact | N/A |
Start Year | 2021 |
Description | CUP-Global |
Organisation | Imperial College London |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | We are collaborating with the Global Cancer Update Programme (CUP-Global) team on processes to automate the processes of systematic review used in the CUP project. |
Collaborator Contribution | The CUP-Global team are providing information on the challenges of information extraction from the literature, and human-curated training datasets. |
Impact | None yet |
Start Year | 2023 |
Description | CVD-COVID-UK |
Organisation | Health Data Research UK |
Country | United Kingdom |
Sector | Private |
PI Contribution | Analyses on the potential role of drug targets in COVID-19 |
Collaborator Contribution | This is a HDR-UK consortium with wide contributions from partners in terms of data, expertise, analyses and technologies. |
Impact | N/A |
Start Year | 2020 |
Description | Genetics of DNA Methylation Consortium |
Organisation | CeMM Research Center for Molecular Medicine |
Country | Austria |
Sector | Academic/University |
PI Contribution | Contributing to a consortial analysis of methylation quantitative trait loci using data from the Accessible Resource for Integrated Epigenomics Studies (ARIES) |
Collaborator Contribution | Contributing to a consortial analysis of methylation quantitative trait loci using data from other studies. |
Impact | Multi-disciplinary collaboration involving molecular epidemiology, statistics and bioinformatics. Outputs: Database of methylation QTL: http://mqtldb.godmc.org.uk/ Publication pending |
Start Year | 2013 |
Description | Genetics of DNA Methylation Consortium |
Organisation | King's College London |
Department | Brain Bank |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | Contributing to a consortial analysis of methylation quantitative trait loci using data from the Accessible Resource for Integrated Epigenomics Studies (ARIES) |
Collaborator Contribution | Contributing to a consortial analysis of methylation quantitative trait loci using data from other studies. |
Impact | Multi-disciplinary collaboration involving molecular epidemiology, statistics and bioinformatics. Outputs: Database of methylation QTL: http://mqtldb.godmc.org.uk/ Publication pending |
Start Year | 2013 |
Description | Genetics of DNA Methylation Consortium |
Organisation | Leiden University Medical Center |
Country | Netherlands |
Sector | Academic/University |
PI Contribution | Contributing to a consortial analysis of methylation quantitative trait loci using data from the Accessible Resource for Integrated Epigenomics Studies (ARIES) |
Collaborator Contribution | Contributing to a consortial analysis of methylation quantitative trait loci using data from other studies. |
Impact | Multi-disciplinary collaboration involving molecular epidemiology, statistics and bioinformatics. Outputs: Database of methylation QTL: http://mqtldb.godmc.org.uk/ Publication pending |
Start Year | 2013 |
Description | Genetics of DNA Methylation Consortium |
Organisation | Newcastle University |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | Contributing to a consortial analysis of methylation quantitative trait loci using data from the Accessible Resource for Integrated Epigenomics Studies (ARIES) |
Collaborator Contribution | Contributing to a consortial analysis of methylation quantitative trait loci using data from other studies. |
Impact | Multi-disciplinary collaboration involving molecular epidemiology, statistics and bioinformatics. Outputs: Database of methylation QTL: http://mqtldb.godmc.org.uk/ Publication pending |
Start Year | 2013 |
Description | Genetics of DNA Methylation Consortium |
Organisation | University of Bristol |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | Contributing to a consortial analysis of methylation quantitative trait loci using data from the Accessible Resource for Integrated Epigenomics Studies (ARIES) |
Collaborator Contribution | Contributing to a consortial analysis of methylation quantitative trait loci using data from other studies. |
Impact | Multi-disciplinary collaboration involving molecular epidemiology, statistics and bioinformatics. Outputs: Database of methylation QTL: http://mqtldb.godmc.org.uk/ Publication pending |
Start Year | 2013 |
Description | Genetics of DNA Methylation Consortium |
Organisation | University of Exeter |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | Contributing to a consortial analysis of methylation quantitative trait loci using data from the Accessible Resource for Integrated Epigenomics Studies (ARIES) |
Collaborator Contribution | Contributing to a consortial analysis of methylation quantitative trait loci using data from other studies. |
Impact | Multi-disciplinary collaboration involving molecular epidemiology, statistics and bioinformatics. Outputs: Database of methylation QTL: http://mqtldb.godmc.org.uk/ Publication pending |
Start Year | 2013 |
Description | IEU/UPenn collaboration |
Organisation | University of Pennsylvania |
Country | United States |
Sector | Academic/University |
PI Contribution | Mendelian randomization projects: conception, design, analysis and interpretation |
Collaborator Contribution | Mendelian randomization projects: conception, design, data and compute resources and interpretation |
Impact | Multi-disciplinary, integrating clinical, epidemiological and informatics expertise. Outputs: doi: 10.1007/s00125-022-05653-1 |
Start Year | 2019 |
Title | ASQ |
Description | The EpiGraphDB-ASQ (ASQ; /??sk/ i.e. "ask") interface is a natural language interface to query the integrated epidemiological evidence of the EpiGraphDB data and ecosystem. The starting point of the query is either a short paragraph of text from which ASQ will derive and extract claim triples, or users can supply those claim triples directly. ASQ will retrieve data from EpiGraphDB, both biomedical entities and evidence from various sources, to faciliate the triangulation of the evidence regarding a specific claim. |
Type Of Technology | Webtool/Application |
Year Produced | 2022 |
Open Source License? | Yes |
Impact | Publication pre-printed and in submission |
URL | https://asq.epigraphdb.org/ |