Ensembl in a new era - deep genome annotation of domesticated animal species and breeds

Lead Research Organisation: University of Edinburgh
Department Name: The Roslin Institute

Abstract

Domesticated animals are economically and socially important species with annual farmed animals outputs alone worth over 15 billion pounds to the UK economy. However, farmed animals, in particular, are associated with a number of challenges. For example, their impact on climate change, as reservoirs of zoonotic diseases, their use of finite resources and also issues around animal welfare. Scientists in the UK and across the globe are meeting these challenges through developing improved breeding strategies, interventions that reduce methane emissions, and advanced welfare approaches. Moreover, domesticated animals are increasingly proving invaluable as biomedical models for better understanding human health. Fundamentally important across these studies is access to highly annotated reference genomes. By, for example, providing a better understanding of the location of functional elements in the genome and the potential impact of genetic changes, researchers can dramatically accelerate their research.

The aim of this project is to create very highly detailed maps of the genomes of domesticated animal species and make them freely available to researchers across the globe. Although reference genome sequences have been generated for domesticated animal species such as cattle, goats, sheep, pigs, chickens, ducks, turkeys, dogs and horses, as well as for several important fish species, they have most often been derived from one animal, and generally poorly represent the diversity across the species. Likewise, the genome sequence alone is of limited use without understanding what different elements of it do in different tissues and cells and at different life stages.

Recently, advances in sequencing technology have meant it is easier and less costly to generate high quality genome assemblies. Instead of one representative genome per domesticated animal species, new genomes for breeds and populations within each species are being generated. For example, there are now genome assemblies for at least three new dog breeds in addition to the original Boxer genome. As cattle are such an economically important livestock species at least two hundred more genome assemblies from different cattle breeds are being produced in the coming three years. Capturing all of this genomic diversity is important because it can help determine breeding and conservation strategies. However, for scientists to make sense of these genomes the expressed and regulatory regions of the genome need to be annotated. Understanding how the genome of a domesticated animal is expressed and regulated can help researchers understand which regions of the genome influence important characteristics of the species of interest. For example, Roslin researchers compared the annotated genome assemblies of water buffalo and domestic cattle to better understand why one species is more susceptible to disease than the other.

Once new genome assemblies are annotated, to maximise their impact, this information needs to be provided in a way that researchers can access it freely. Ensembl provides a means for researchers to look at or 'browse' the annotated genome information. As part of this project, we will provide training to ensure that researchers can fully utilise Ensembl resources to investigate annotated genomes efficiently and interpret the functional relevance of their analysis. The databases and tools provided by Ensembl have been shown to be a powerful and effective means of annotating the complex genomes of domesticated animal species, and it is essential they keep pace with the large amounts of new data and research questions that are being generated.

Technical Summary

The Ensembl genome browser is a widely used web-based interface that makes deeply annotated reference genomes for domesticated animals available in a unified way to researchers. An explosion in the number of genomes produced for domesticated animals is expected in the coming three years. In this proposal we describe how we will ensure that the Ensembl genome browser can keep pace to provide deep annotation of these genomes.

Populations of domesticated animals are diverse, including many different breeds and populations within each species. Advances in sequencing technologies means that the recent rise in the number of assembled genomes for domesticated animal species is expected to continue and accelerate. However:

- Current Ensembl resources are primarily focused around individual reference genomes for a single or a small number of representatives per species.
- New ways of storing, comparing, annotating, visualising and making available the diversity of genomes for each domesticated animal species are urgently required.
- Support for efforts to annotate this wealth of genome sequence data in a timely manner is critical to realising the potential impact of these data.

The overarching aim of this proposal is to establish and maintain deeply annotated genomes for domesticated animal species in the Ensembl genome browser. To achieve this aim we will:

- Analyse and annotate domesticated animal genomes as they become available, including alternate assemblies, exploiting the growing volumes of functional data.
- Run comparative genomics analyses both between species and within species.
- Acquire data from re-sequencing projects to characterise genetic variation within species and annotate variants by genomic region.

To ensure that the research community can make the most efficient use of the resource we will provide training and ensure we regularly adjust our priorities based on user feedback.