Modular workflow for the community-led development of custom livestock DNA methylation arrays

Lead Research Organisation: University of Edinburgh
Department Name: The Roslin Institute

Abstract

Epigenetic marks are changes on the genome that do not involve DNA sequence alterations. A commonly studied epigenetic mark is DNA methylation, that affects how genes are expressed and has been linked to diseases and other traits in both humans and animals. DNA methylation can determine if a particular gene is switched on (expressed) or not in a particular tissue. Research has shown that, unlike the DNA sequence, DNA methylation can also act as a record of the environment we are exposed to. For example, some DNA methylation "sites" can distinguish smokers from never-smokers, animals with different nutritional states or fertility.

Currently investigating DNA methylation is only possible through the use of expensive sequencing technologies, and this excessive cost prevents scientists from studying DNA methylation in livestock and how it could contribute to informing selection strategies to breed healthy and productive animals effectively and sustainably. The project we propose will develop the tools to enable these studies. We have engaged with leaders in the field of livestock epigenetics, industrial partners and breeders forming a truly multidisciplinary team with a common goal. As there are multiple livestock species we propose to develop the computational tools to select the most informative DNA methylation "sites" to test in an epigenetic kit similar to the direct-to-consumer genetic testing kits. This epigenetic test done using DNA methylation arrays (kits), is substantially cheaper than obtaining the DNA methylation information through sequencing. We will do the "site" selection based on information we already know about where DNA methylation changes are important, for example, at genes and their regulatory sites, and using data from a set of animals that have DNA methylation data obtained through sequencing. At the moment, cattle is the livestock species with the most DNA methylation information derived from sequencing. So we will initially focus on cattle to design an array that will facilitate affordable epigenetic research in that species using our computational workflow, that can then be used in other species as the amount of data increases. We will work with a company that can produce the first DNA methylation kits designed to cover the most informative parts of the genome with respect to trait variation. This work could potentially have a major economic impact as it could contribute to the breeding of healthier and more efficient livestock, which could decrease the environmental impact of animal production, through, for example, better use of resources or lower methane emissions.

Technical Summary

Epigenetic changes, such as DNA methylation, play roles in genome regulation and variation in complex traits that are poorly understood. Despite their importance, progress in the area has been hampered by the cost of quantifying such changes at scale. At present studying DNA methylation in large livestock cohorts is prohibitively expensive as there are no high-quality DNA methylation arrays available and investigations rely on sequencing technologies that are expensive for academic researchers, funding bodies and industry. Here, we plan to develop a flexible bioinformatics workflow and associated tools to design powerful DNA methylation arrays and to apply this workflow to develop the first genome-wide single base pair resolution array for cattle. Cattle has been chosen as an initial exemplar due to the high community demand and availability of a large enough set of bisulfite sequencing data that can be used to design the array. The array content will be developed with input from the academic and industrial community and will include CpG sites at all genes of the genome, CpG islands and differentially methylated regions identified through the bioinformatic analysis of the bisulfite sequence data, including annotation with relevant features obtained from the public domain and pre-published data from collaborators, and input from technology developers, with a view on maximising utility across use cases. The pipeline we propose to build will be modular and reusable to allow designing arrays with different focusses (e.g. fertility or disease resistance) or for different species. We will make the bioinformatic resources and cattle array content publicly available and extendable. The availability of a cost-effective DNA methylation array will catapult research into the root causes of health, production and environmental traits and improve breeding programmes of difficult to predict traits like infectious diseases.

Publications

10 25 50