Ensembl and enabling genetics and genomics research in farmed animal species

Lead Research Organisation: Wellcome Sanger Institute
Department Name: Bioinformatics Division

Abstract

The sequence of almost all genes (a draft genome sequence) has been determined for several farmed and companions animals including cattle, pigs, chickens, turkeys, dogs and horses. Draft genome sequences for several other species such as sheep, ducks and salmon will be completed soon. The strings of billions of bases (symbolised as four letters A, C, G, T) that constitute these genome sequences are not immediately useful to biological research scientists. Annotating these draft genome sequences with features such as the coding and regulatory parts of genes, and bases which differ between individuals within a species (genetic variants) greatly enhances the value and utility of the genome sequence. Visualising the genome sequences complete with annotations in an freely accessible manner further improves the value of the information. The web-mounted Ensembl genome browser, databases and associated annotation tools have been shown to be powerful and effective means of annotating the complex genomes of animal species including humans, mice and more recently farmed and companion animals. This project is concerned with improving the quality of genome annotation for farmed and companion animal genomes. International consortia of scientists are using the so-called next generation sequencing technology, not only to sequence the genomes of more economically important species, but also the genomes of multiple individuals for each species of interest and to improve or finish the reference genome sequences for key species. These new sequencing technologies are also being used increasingly in assays, for example, of the extent of gene expression in different cells or under different conditions (transcriptomics) or of the state of the genome (epigenomics). Mapping the sequence read-outs from these assays back to the relevant genome sequence not only provides a genome-wide framework for analysis but also provides further information with which to annotate the genome sequence itself. Thus, there is a recurring need to refresh the genome sequence annotation for important animal species. We will use the Ensembl system to annotate the genome sequences of key farmed and companion animal species. The resulting annotated genome sequences will be made freely available as resources mounted on the World Wide Web. Recently developed features within the Ensembl system enable the analysis and visualisation of genetic variation (i.e. sequence differences) between individuals of the same species. This genetic variation explains the differences in traits such as growth, milk yield and susceptibility to disease. We will populate the Ensembl-animal Variation databases with sequence and genotype data acquired from the animal genetics research community. Visualising these variation data and making them accessible to the scientific community and the animal breeding industry will facilitate research to understand the genetic control of complex traits in animals and genetic improvement of farmed animals. A high quality annotated reference genome sequence is a critical bioinformatics resource for the effective prosecution of contempary research in the biological sciences. The value and utility of such bioinformatics resources are critically dependent upon the currency of the resource. Thus, this project is concerned with delivering high quality up-to-date annotated reference genomes for key farmed and companion animal species to enable research on these economically or socially important animal species.

Technical Summary

A high quality annotated reference genome sequence is critical to contempary research in the biological sciences. The Ensembl browser and associated annotation tools and database have been shown to be robust and effective means for making genomic information useful to a wide range of users. Draft reference genome sequences have been established for several farmed animal species (chicken, cattle, pig, horse, turkey) and sequencing is well advanced for several others (including sheep, duck, salmon). Annotated assemblies have already been made available through the Ensembl (chicken, cattle, pig, horse) and Pre-Ensembl (duck, turkey, sheep) genome browsers. However, the utility of a bioinformatics resource are critically dependent upon the currency of the resource. Genome sequence assemblies, including the 'finished' human and mouse sequences are subject to continual revision as new data are acquired and errors corrected. This proposal is concerned with maintaining the currency of Ensembl in respect of farmed and companion animal species, including poultry and farmed fish. Whilst first draft genome sequences have been established for several of the species of interest, improved genome assemblies and increased volumes of ancillary data, including RNAseq and ChIPseq data are also being generated for these species. Thus, we will use these growing and improving data to develop up-to-date and enhanced annotation for these species. Not only are the genomes of more farmed animal species but also the genomes of multiple individuals within a species are being sequenced. The recently developed Ensembl variation resources allow these additional data to be captured and visualised for the benefit of scientists engaged in genetics and genomics, and other lines of, research on the target species. We will work with the animal sciences research community to acquire re-sequence data, SNP and CNV genotypes with which to populate the Ensembl-animal variation databases.

Planned Impact

Who will benefit? The primary beneficiaries from this proposed development and maintenance of Ensembl resources for farmed and companion animals will be researchers in academia and industry in the UK and beyond. The access statistics and citations of Ensembl papers provide evidence of the demand for Ensembl resources from the research community. The world's leading animal breeding and aquaculture breeding companies, of which some of the largest are UK companies, have in-house genetics expertise. Thus, these companies have the expertise to exploit the information captured and disseminated through Ensembl resources. Suppliers of species specific 'omics tools such as expression arrays, SNP chips and proteomics system will benefit from access to annotated genomes sequences which include links to features (e.g. probes) on their products. There are potential indirect benefits to the wider public through the addressing of the food security agenda as discussed below. How will they benefit? The proposed enhanced Ensembl resources, especially the genetic variation resources, will enable research to dissect the genetic control of economically important (and complex) traits in farmed animals including feed efficiency and susceptibility to infectious diseases. In companion animals such as dogs these resources will enable the identification of the determinants of inherited diseases. This enabling of genetics research in farmed animals and fish will facilitate advanced genetic improvement for these species. In the past 40+ years, there have been major productivity gains in dairy cattle, pigs and poultry and there have also been significant reductions in the greenhouse gas emissions and global warming potential per tonne of animal product. These gains have been achieved through genetic improvement alone or in combination with better husbandry, nutrition and disease control. Genetic improvement of farmed animal species is a key means of addressing the food security agenda for the animal agriculture and aquaculture sectors. In companion animals the benefits will be improved tools for selective breeding to minimise inherited diseases and inbreeding and to improve animal welfare. The utility of 'omics technology products such as expression microarrays and SNP chips is greatly enhanced when the features on these products can be linked to a well-annotated genome sequence and other information sources. For example, probe sets for Affymetrix arrays and SNPs on Illumina chips can be linked to annotated genes and genome locations respectively, thus enabling more effective use of these products. Academic and other researchers will benefit from the ability to link the read-out from assay by sequence assays to an annotated genome sequence. Without such a frame of reference such assays are of limited value. The impacts on research will be delivered within the timeframe of the proposed project to enhance Ensembl resources for farmed and companion animals and continue thereafter. Maintaining the currency of the genome assemblies and the associated annotation is critical to ensuring that these impacts continue to be effective. The indirect impacts, for example, on the food security agenda and hence the benefits to the agriculture and aquaculture sectors and the wider public will take longer to be felt. However, the time to impact for genetic tests for susceptibility to inherited or infectious diseases in animals with their positive impacts on animal welfare can be short - 1 to 3 years.

Publications

10 25 50
publication icon
Aken BL (2016) The Ensembl gene annotation system. in Database : the journal of biological databases and curation

publication icon
Aken BL (2017) Ensembl 2017. in Nucleic acids research

publication icon
ENCODE Project Consortium (2020) Perspectives on ENCODE. in Nature

publication icon
Flicek P (2013) Ensembl 2013. in Nucleic acids research

publication icon
Flicek P (2012) Ensembl 2012. in Nucleic acids research

publication icon
Flicek P (2014) Ensembl 2014. in Nucleic acids research

publication icon
Frankish A (2019) GENCODE reference annotation for the human and mouse genomes. in Nucleic acids research

publication icon
Frankish A (2021) GENCODE 2021. in Nucleic acids research

publication icon
Herrero J (2016) Ensembl comparative genomics resources. in Database : the journal of biological databases and curation

 
Description The output from this grant is improved annotation of farm animal genomes, including critically the location of genes in the genomes. Comparative genome alignments between different species have also been generated and updated, making it easier for insights from research in one species to be related to observations in another species. All of this data is organised in a robust software environment with multiple interfaces allowing its use by a wide range of researchers.
Exploitation Route The Farm animal genome annotation and genome browser tools generated provide a resource that underpins Farm animal research.
Sectors Agriculture, Food and Drink,Healthcare,Pharmaceuticals and Medical Biotechnology

URL http://www.ensembl.org/
 
Description The genome sequence and associated annotation are made available through the Ensembl genome browser. The browser is widely used by Farm animal communities to integrate data they have independently collected, design specific experiments etc. Handling large genomes, generating annotation and providing tools to use this data requires substantial IT and software infrastructure. By organising data centrally through the Sanger/EBI Ensembl project software, annotation and bioinformatics services the productivity of Farm animal researchers is greatly increases since they can share data using a common platform and standards and avoid each investing substantially in duplicate bioinformatics analysis.
First Year Of Impact 2012
Sector Agriculture, Food and Drink,Healthcare,Pharmaceuticals and Medical Biotechnology
Impact Types Economic

 
Title Addition of Ensembl-Havana gene set to Pig (Ensembl release 69) 
Description An Ensembl-Havana gene set was added to the annotation. The VEGA manual annotation which had been generated through a community effort was added. 
Type Of Material Database/Collection of data 
Year Produced 2012 
Provided To Others? Yes  
Impact The annotated reference genome sequences have been delivered through a series of Ensembl releases 
URL http://oct2012.archive.ensembl.org/Sus_scrofa/Info/Index
 
Title Chicken Galgal4 
Description The annotation of the latest chicken genome assembly (Galgal4) went from Pre-Ensembl to the full Ensembl site, including a revised Gene Build 
Type Of Material Database/Collection of data 
Year Produced 2013 
Provided To Others? Yes  
Impact na 
URL http://apr2013.archive.ensembl.org/Gallus_gallus/Info/WhatsNew?db=core
 
Title New genome assemblies for chicken, cow and horse added (Ensembl release 95 - Jan 19) 
Description The genomes and gene annotation for three important agricultural species were updated in Ensembl, using the new genomes assemblies for chicken (GRCg6a), cow (ARS-UCD1.2) and horse (EquCab3.0). There was also a probe mapping update for the cow and chicken genomes in Ensembl. 
Type Of Material Database/Collection of data 
Year Produced 2019 
Provided To Others? Yes  
Impact These are additions to an existing public database resource that facilitates research by others. 
URL http://www.ensembl.org/
 
Title Pig cross-breed USMARC genome assembly added (Ensembl release 97 - July 19) 
Description The genome of a pig cross-breed USMARC was added to Ensembl. This pig is a cross-bred offspring of dystrophin deficient line of pigs submitted by the USDA ARS. This assembly is available in addition to the existing pig reference. 
Type Of Material Database/Collection of data 
Year Produced 2019 
Provided To Others? Yes  
Impact These are additions to an existing public database resource that facilitates research by others. 
URL http://www.ensembl.org/
 
Title Updated Pig Ensembl Website (Ensembl release 77) 
Description Secondary structure of non-coding RNAs are now shown on the gene summary page, using the R2R package 
Type Of Material Database/Collection of data 
Year Produced 2013 
Provided To Others? Yes  
Impact The annotated reference genome sequences have been delivered through a series of Ensembl releases. 
URL http://www.ensembl.org/Sus_scrofa/Info/WhatsNew?db=core
 
Title Whole genome assemblies and annotation added for 11 pig breeds (Ensembl release 98 - Sept 19) 
Description whole genome assemblies and annotation available for 11 pig breeds: Hampshire, Jinhua, Berkshire, Large White, Landrace, Pietrain, Rongchang, Meishan, Tibetan, Wuzhishan and Bamei. These are each available as separate genomes with their own genes annotated, based on the reference pig genome Sscrofa11.1, which also has updated gene annotation. We have a new EPO multiple genome alignment of all the new pigs, plus the reference pig, the USMARC pig genome which came out in release 97, and the related agricultural species sheep, cow and horse. We also have computed a specific set of gene-trees for those genomes 
Type Of Material Database/Collection of data 
Year Produced 2019 
Provided To Others? Yes  
Impact These are additions to an existing public database resource that facilitates research by others. 
URL http://www.ensembl.org/
 
Title Yak and Bison genomes added; (Ensembl release 96 - April 19) 
Description The genomes of related bovine species Bos mutus (Wild yak) and Bison bison bison (American bison) were added to Ensembl. 
Type Of Material Database/Collection of data 
Year Produced 2019 
Provided To Others? Yes  
Impact These are additions to an existing public database resource that facilitates research by others. 
URL http://www.ensembl.org/