Development of a multilevel and mixture-model framework for modelling epigenetic changes over time (resubmission)

Abstract

The epigenome sits on top of genes (DNA sequences) and controls whether genes are act or do not. It explains for example why 'identical' twins differ in their behaviours and health outcomes like their blood pressure. Scientists are increasingly interested in epigenetics, the study of the epigenome, to better understand the links between behaviours (such as smoking), genes and disease. Epigenetic patterns are known to change over time, which may partly be due to the influence of environmental factors (e.g. pollution), characteristics (such as our blood pressure) and behaviours (like smoking). In addition epigenetic patterns seem to change as we get older. Being able to understand which part of the epigenome changes over time, and how and when it changes could be important for understanding how risk factors interact with genes to cause disease and the general decline in health as we get older.
At the moment we do not have good statistical methods for doing this research because of how complex epigenetic data are. The first issue is that there is a large number of methylation (epigenetic) sites for each person - 450,000 with one of the common technologies used to measure these. This means that identifying a small number of these sites that are related to a given environmental factor, characteristic or health outcome is difficult. Secondly, identifying how epigenetic sites changes over time is not straightforward because the way in which these are measured which makes it difficult to know whether change over time is because of large change between a small number of sites or small changes between a large number of sites. Thirdly, epigenetic sites are clustered (group together) within regions of our genome, and thus two sites from the same region may be more similar than two sites from different regions.
In this project, we aim to develop sophisticated statisticalmethods for identifying sites which show change in methylation over time, and relating those changes to risk factors and later health outcomes. This will ensure the best possible use of this emerging technology in investigating how the environment and lifestyle interact with genes to cause disease. We will make sure our new methods can work in commonly used statistical packages and make them freely available to all scientists.

Technical Summary

Epigenetic modifications (e.g. DNA methylation) are increasingly used to investigate how the environment and lifestyle impact upon disease risk. Recent technological advances have facilitated the large scale generation of longitudinal epigenetic data, and there is a need to develop statistical tools for use in this field.
There are three main motivations for analyses of epigenetic changes over time: to identify areas of methylation which are more/less stable over time; to examine how exposures (e.g. environmental exposures, or behaviours such as smoking) are associated with the epigenetic changes; and to investigate the potential health consequences of epigenetic changes. Analyses are complicated by the high-dimensionality of the epigenetic space and correlation within CpG regions.
We will develop multilevel and mixture methods for more robust analysis of dynamic epigenetic data. We will use MLMs to identify sites and regions which vary over time. We will compare three ways of pre-analysis dimension reduction, including extending DPPCA for application to epigenetic data. We will develop multivariate MLMs for modelling epigenetic change at several sites with respect to exposure/outcome. Current GGMM and Bayesian Mixture Models will be extended and used to derive latent classes of change across several CpG sites, and to identify the small number of CpG sites with evidence of effects which change over time. We will then develop methods to assess whether genetic variation changes over time, for example are specific to particular developmental stages. Power and performance of all methods under a range of study designs will be compared, using simulations. Data from ALSPAC and other longitudinal studies available to the co-applicants, will be used to benchmark our methods and conclusions. These data will be from 450k and bisulphite sequencing datasets, but the methods will be applicable to any application with a large number of correlated measurements.

Planned Impact

The field of epigenetics is growing at a rapid pace, in many instances in domains of research unfamiliar with advanced statistical methodology. We aim to generate an accessible and widely applicable selection of sophisticated statistical methods, together with software and simulated data, to serve this growing research community.

Methodological research such as that proposed here is unlikely to have immediate direct benefits for those outside the research community. However, the ability to carry out high-quality research clearly has wider societal benefits and medium term benefits could be gained by commercial enterprises who have an interest in the development of epigenetic biomarkers. These may include pharmaceutical or biotech companies wishing to develop epigenetic biomarkers of disease risk, of treatment efficacy or of disease prognosis. The proposed work holds considerable interest to social scientists concerned with the biological embodiment of a variety of social and environmental exposures through epigenetic mechanisms, their potential reversibility and how these factors impact across the life course on behaviours as well as health.

The researcher working on the project will gain experience in new methodology, and also experience different working environments through research visits to Cambridge, Dublin, Leiden and Boston. Participation in the wider group of academics interested in this area will enable him to be at the cutting-edge of new methodological developments, and provide an ideal springboard for his development as an independent researcher. The wider MRC IEU will also benefit from this methodological development, and researchers within the MRC IEU will gain exposure to sophisticated statistical methods which will ensure their competitiveness in the research job market.

Funded Value:

£296,416

Funded Period:

Jan 16 - Nov 18

Funder:

MRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

MR/M025020/1

Principal Investigator:

Kate Tilling

Health Category:

Unclassified

Organisations

University of Bristol (Lead Research Organisation)

People	ORCID iD
Kate Tilling (Principal Investigator)
Laura D Howe (Co-Investigator)
Caroline Relton (Co-Investigator)
Frank De Vocht (Co-Investigator)
Vanessa Didelez (Co-Investigator)
Tom Gaunt (Co-Investigator)	http://orcid.org/0000-0003-0924-3247
Debbie Lawlor (Co-Investigator)
George Davey Smith (Co-Investigator)
Oliver Stegle (Co-Investigator)
Andrew Smith (Researcher)

Publications

Author Name

Title Publication Date Published

10 25 50

Kandaswamy R (2020) DNA methylation signatures of adolescent victimization: analysis of a longitudinal monozygotic twin sample in Epigenetics

Mills HL (2019) Methods for Dealing With Missing Covariate Data in Epigenome-Wide Association Studies. in American journal of epidemiology

Staley JR (2022) A robust mean and variance test with application to high-dimensional phenotypes. in European journal of epidemiology

Staley JR (2018) Longitudinal analysis strategies for modelling epigenetic trajectories. in International journal of epidemiology

Research Databases and Models


Title	EWAS catalog
Description	The EWAS Catalog was developed to enable the scientific community to search results from epigenome-wide association studies of DNA methylation. The Catalog currently contains published results from the literature with p < 1×10-4, and is available at www.ewascatalog.org.
Type Of Material	Database/Collection of data
Year Produced	2018
Provided To Others?	Yes
Impact	N/A
URL	http://www.ewascatalog.org.