Development of a system to simultaneously detect mutations and epigenetic marks

Lead Research Organisation: University of Oxford
Department Name: Big Data Institute - NDM

Abstract

DNA is the information storage system of the cell. It consists of four building blocks (A, C, G & T) that form a long chain. Many diseases have their origin in changes to the DNA sequence, most notably cancer. Being able to read DNA has revolutionised biomedical research and led to new ways to diagnose diseases.
We now know that there are additional "punctuation marks", in the form of chemical modifications on C's, which affect how the instructions in the DNA are executed. These so-called "epigenetic" marks are not inherited, but they profoundly influence the behaviour of cells, and are important to understand a wide range of diseases.
The existing methods to read these epigenetic marks are more expensive and more difficult to perform than standard DNA sequencing. As a result, there is a lot more data from, and more applications of, DNA sequencing than epigenetic sequencing.
In this project, we are proposing to develop a method to read genetic and epigenetic information at the same time, and at little to no extra cost compared to standard DNA sequencing. To do so, we are making use of a new chemical method for measuring epigenetic marks that we developed recently. We will carefully generate test data which will be used to train machine-learning algorithms to optimise the accuracy of the sequencing method, and to establish the best possible experimental parameters for this technique.
The resulting method will make it possible to routinely query a patient's genetic background, while simultaneously measuring their epigenetic state. This will lead to a much broader understanding of the role of epigenetics in disease.

Technical Summary

Changes to epigenetic marks, in particular cytosine methylation, are an key feature of many diseases. For example, it has been shown that CpG-island hypermethylation is a common occurrence in cancer cells. Despite their well-established importance in many diseases, the number of genome-wide maps of DNA modifications is different healthy and diseased tissue types is small. The reason for this is that bisulfite sequencing, the current state-of-the-art method for reading DNA modifications, is more expensive than standard genome sequencing, and requires more material, due to the DNA-damaging nature of chemicals involved. Furthermore, bisulfite sequencing results in a much higher error rate per base. Most projects therefore choose to not perform genome-wide measurements of cytosine methylation.
We recently published a new method for sequencing epigenetic marks, called TAPS, that is both cheaper and results in data of very similar quality as standard DNA sequencing. Here, we propose to develop algorithms to identify both genetic variation and epigenetic marks from the same sequencing data simultaneously. This would make it possible to perform only one experiment, at similar cost to standard sequencing, and produce information regarding the genotype and gene regulatory state of a sample at once. We believe that this would greatly increase the utility of the TAPS method, but more importantly it would lead to a more epigenetic information being generated routinely, leading to a much better understanding of the role of epigenetic marks in different diseases.

Planned Impact

We recently developed a new sequencing method that greatly facilitates the measurement of so-called epigenetic marks on the genome which control many aspects of tissue-specific gene regulation. Epigenetic marks are known to be related to many diseases, such as autism, obesity and cancer. The epigenome and the genome are tightly linked, yet we know a lot less about epigenetic than about genetic variation, because it is much easier and cheaper to measure the genome than to measure the epigenome.
Here, we propose to develop a software approach that can be combined with our newly develop sequencing method to read both genetic and epigenetic state of a sample at the same time, at comparable cost than performing genome-only sequencing.
In the immediate future, this will benefit biomedical researchers working on any disease where both genetic and epigenetic state are important. We anticipate that epigenetic changes will be interrogated much more frequently, leading to new discoveries regarding gene regulatory mechanisms in specific tissue types.
There are a number of research groups and commercial companies that are trying to use epigenetic and genetic information, by means of circulating cell-free DNA extracted from blood samples, to detect cancer and other diseases in a much less invasive way. A lack of genome-wide maps of tissue specific methylation, and the difficulty of reading both genetic and epigenetic information from the very small amounts of DNA extracted from a patient blood sample, are the main obstacles to success of these approaches. Our method will improve the generation of information necessary to develop these methods, but it will also be directly applicable to cell-free DNA sequencing data. In the medium-term future we therefore anticipate that our method will accelerate and improve the development of new diagnostic assays.

Publications

10 25 50