Understanding disease through environment-wide association studies

Lead Research Organisation: University of Edinburgh
Department Name: The Roslin Institute

Abstract

The disease risk for each individual is influenced by both genetic factors and their environment. Environmental factors encompass, for instance, exposures to external risk factors, like, e.g., exposure to pesticides, and behavioral exposures, like, e.g., smoking or fitness habits. For many diseases a majority of the variation in individual disease risk is thought to be due to differences in environmental factors. Furthermore unlike genetic factors, which cannot be controlled, exposure to environmental risks can be moderated providing a pathway to disease prevention, in particular for individuals with elevated genetic risk. While for some diseases major environmental risk factors have been identified, like, e.g., the link between smoking and lung cancer, most environmental factors are likely to have small effects, with many such factors jointly acting on disease risk. This project aims to develop methods and computational tools which will make it possible to identify such small to moderate environmental risk factors in large cohorts. Using over a thousand environmental factors measured in about 500,000 participants of the UK Biobank, we will apply these methods to identify factors which are associated with myocardial infarction (heart attack) and stroke, two of the leading causes of death in the UK.

The results from this projects could help in developing concrete recommendations regarding environmental factors people are exposed to and their lifestyles choices such as to lower their risk of heart attacks and stroke. Identified risk factors could also provide information about pathways involved in these diseases, helping us to better understand how the disease develops. More generally, the developed of methods and tools will be widely applicable to other disease and will allow us to create better individual risk profiles allowing for better allocation of resources for disease monitoring and prevention.

Technical Summary

Susceptibility to disease is influenced by the genotype, the environment and their interactions. Genome-wide association studies (GWAS) have become widespread and serve as a highly successful hypothesis free approach to identify genetic variants associated with disease. Hypothesis free quantification of risk factors, including, for instance, external environmental exposures and behavioral lifestyle choices, on the other hand has received comparatively little attention. In this project we will develop and apply methodology to perform Environment-wide Association Studies (EnvWAS) whilst adjusting for large numbers of genetic markers in samples of unprecedented size. We will showcase the methodology using myocardial infarction and stroke as exemplars. Power of EnvWAS could potentially be increased by using generalized linear mixed models (GLMM) that will allow us to adjust for the genetic make-up of participants, similar to adjusting for a polygenic effect when testing individual variants within a GWAS. Utilizing distributed computation on computer clusters we will implement algorithms that will allow for fitting of the necessary GLMMs with dense covariance structure on sample sizes of hundreds of thousands of individuals. Using incident cases and the rest of ~460,000 White-British individuals in the UK Biobank as controls we will test for association of ~1200 potential risk factors with myocardial infarction and stroke. Estimates of effects for significantly associated risk factors will be obtained by fitting them in a joint model. Finally, the shared environmental and genetic burden between the two diseases will be investigated.

Planned Impact

In the short term, our research will benefit the scientific community working on the identification of environmental risk factors of disease. Specially those working with large genotyped cohorts like the UK Biobank. The tools we will develop can be applied to all diseases and will be able to analyze datasets of unprecedented scale. This will help to identify risk factors of small effects that cannot be identified with much smaller studies. Researchers trying to identify gene by environment interactions could also benefit as it is likely that identifying new environmental risk factors could help understanding how these are modulated by the genotype, but also reduce the search space of a whole hypothesis free gene by environment interaction search. This could be achieved during the duration of the research project or soon after.

The identified risk factors, could inform prediction models of risk, which could potentially benefit patients and health professionals. It could benefit patients through early diagnosis (through improved risk stratification) and inform their decision on what exposures need to be avoided or what preventative measures could be taken. It could help health professionals by directing interventions to those that need them more. This could be achieved in the mid, long term.

Other fields like animal breeding or ecology would also benefit of the methods, but applied to other scientific questions. Similarly, the industry (for example, the banking sector) could potentially be interested in the methods as they generate vast amounts of data, which is difficult to analyse without appropriate tools.

Finally, the post-doctoral researcher employed on the grant will benefit from the excellent environment and a multidisciplinary team of academic collaborators, which includes epidemiologists, geneticists and statisticians.
 
Description UK Biobank Research Analysis Platform 
Organisation UK Biobank
Country United Kingdom 
Sector Charity/Non Profit 
PI Contribution We were invited by Mark Effingham (Depute CEO of UK Biobank) to be one of the avant-garde teams to access the UK Biobank research analysis platform to adapt and deploy some of the tools we have developed for the analysis of genomic data.
Collaborator Contribution We are working with UK Biobank and DNAnexus to set up the compute configuration to allow fast genome-wide association studies with array genotypes, imputed genotyped, whole exome and whole genome data.
Impact No outputs yet.
Start Year 2020
 
Company Name OMECU LIMITED 
Description Software development for analysis of big data. 
Year Established 2021 
Impact Received support from the Wellcome iTPA programme, participated in the SETSquared ICURe programme, and received Medical Research Council grants. They also received funding from the University's Data-Driven Entrepreneurship Seed Fund and Fast Track Mentor initiatives, supported by the Scottish Funding Council.
Website https://www.omecu.com
 
Description Maths and biology. James Gillespies' High School 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Schools
Results and Impact 20-30 pupils and 3-4 teachers attended for presentations from my lab on how numerical skills (mathematics and computing) are applied in biological settings. One of these students, now at University has visited since the Roslin Institute to speak to other researchers.
Year(s) Of Engagement Activity 2018
 
Description Seminar - MRC Centre for Neuropsychiatric Genetics and Genomics 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Part of research institution seminar series
Year(s) Of Engagement Activity 2017