Imputation-aided analysis of DNA methylomes

Lead Research Organisation: University College London
Department Name: Structural Molecular Biology

Abstract

Epigenetics is the study of heritable changes in gene and genome function that occur without changes in the DNA sequence itself. It is mediated by chemical coding recorded on top of the DNA sequence which is found throughout nature and affects many biological processes in health and disease. These mitotically heritable, yet reversible chemical modifications act upon the chromatin structure and specific DNA bases to regulate gene expression in a cell-specific manner. DNA Methylation is one such modification where individual cytosine bases undergo methylation producing 5-methylcytosine. In mammals including humans, it occurs predominantly in the context of cytosine guanine dinucleotides (CpG sites).
Due to its central role in normal human development and numerous diseases, genome-wide DNA methylation (methylome) analysis is of broad interest in medical research. However, generating DNA methylomes on a large scale is quite expensive; considerably more so than whole-genome sequencing. This is because of technical issues inherent to bisulfite sequencing (e.g. DNA fragmentation, incomplete bisulfite conversion, reduced mapping efficiency), requiring a higher sequencing coverage in order to confidentially call the methylation status at a cytosine residue. One potential way to reduce this cost is to explore in silico methodologies, such as imputation, to improve the coverage and quality of the data produced in these experiments.
Imputation is a statistical technique where missing values are substituted with a computed value1. The process requires reference data from which the missing information can be extracted and imputed to boost quality, power, fine-map associations and facilitate integrative analysis using meta data. Imputation of genotypes as well as haplotypes has become routine and has already proved invaluable for the discovery of many replicated associations for many complex human diseases2. In comparison, imputation of epi-genotypes such as DNA methylation states is still in its infancy. Current methylation imputation/recovery techniques include ChromImpute3, COMETvintage4 and DeepCpG5 which impute/recover missing data in Whole-Genome Bisulfite Sequencing (WGBS). ChromImpute utilises regression tree ensemble predictors to impute multiple epigenomic signals (including DNA methylation). COMETvintage uses segmentation to calculate blocks of co-methylation (COMETs) to recover differentially methylated signal missing from methylomes not sequenced deep enough. DeepCpG utilises deep learning techniques to determine associations between DNA sequence patterns and methylation states as well as between neighbouring methylation sites. Neither of these tools has yet been used to reduce the cost of methylome analysis by reducing sequencing depth while increasing data quality and accuracy.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
BB/M009513/1 01/10/2015 31/03/2024
1759459 Studentship BB/M009513/1 01/10/2016 30/03/2021 Muhammad Moghul