Ensembl - adding value to animal genomes through high quality annotation

Lead Research Organisation: European Bioinformatics Institute
Department Name: Genome Assembly and Annotation

Abstract

Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.

Technical Summary

High quality annotated genomes are essential resources for life sciences research.

Draft reference genome sequences have been established for several farmed and domesticated animals: cattle, goat, pig, sheep; chicken, duck, turkey; dog, horse; rainbow trout, salmon, tilapia. Substantially improved genome assemblies have been established for goat, pig, cattle, sheep, water buffalo, chicken) using long read sequencing technologies. There are gaps in the annotation of these genomes in terms of transcript complexity, non-coding genes, pseudogenes and regulatory sequences. Moreover, the pseudo haploid genome sequence of one individual provides an incomplete view of a species' genome.
Scientists are generating more and better genome sequences for additional species and individuals within a species. Researchers, especially in the FAANG and FAASG consortia are generating functional data for annotation of coding, non-coding and regulatory sequences.

We will analyse and annotate farmed and domesticated animal genomes as they are released, exploiting the growing volumes of functional data (short and long read RNA-seq / transcript sequences; ChIP-seq; ATAC-Seq; CAGE; bisulfite sequence) to identify coding genes, non-coding genes and regulatory sequences. We will acquire data from re-sequencing projects to characterise genetic variation within species (SNPs, indel, structural variants) and display this variation in its genomics context. We will run comparative genomics analyses both between species and within species.

We will disseminate the resulting richly annotated genome sequences freely via the Ensembl Genome Browser and via an API for power users. These annotated genomes will provide an integrated view of functional sequences (coding, non-coding and regulatory) and sequence variation for a single or multiple individuals for key farmed and domesticated animals.

To maximise use of this resource we will provide demonstrations, on-line and face-to-face training.

Planned Impact

Who will benefit?
We anticipate the beneficiaries of the Ensembl genome portal for farmed and companion animals to be:
(i) the academic research community
The primary beneficiaries from this proposal for development and maintenance of Ensembl resources for farmed and companion animals will be researchers in academia and industry in the UK and across the globe.
(ii) animal breeding companies
The world's leading animal breeding and aquaculture breeding companies, of which some of the largest are UK companies, have in-house genetics expertise. Thus, these companies have the expertise to exploit the information captured and disseminated through Ensembl resources.
(iii) owners of farm and companion animals and other stakeholders
Research on domesticated animals has important socio-economic impacts, including underpinning and accelerating improvements in veterinary research and agriculture and improving animal health and welfare. Suppliers of species specific 'omics tools such as expression arrays and SNP chips will also benefit from access to annotated genomes sequences.
(iv) science infrastructure and capacity
The Ensembl project is one of three systems worldwide concerned with delivering annotated genome sequences for a large number of species to the scientific community. As such the Ensembl genome portal makes a considerable and continuing contribution to maintaining science infrastructure and capacity.
(v) society
The Ensembl portal provides direct benefits to specific demographics of society through the provision of out-reach activities and training, as well as to society more widely.

How will they benefit?
(i) the academic research community
The Ensembl genome portal adds value to animal genomes via high quality annotation. High quality annotated reference genome sequences are essential resources for contemporary research in the biological sciences. The Ensembl browser and associated annotation tools and database have been shown to be robust and effective means for making genomic information useful to a wide range of users.
(ii) animal breeding companies
The proposed Ensembl resources, especially the genetic variation resources, will enable researchers to dissect the genetic control of economically important (and complex) traits in farmed animals including feed efficiency and susceptibility to infectious diseases. This enabling of genetics research in farmed animals and aquaculture species will facilitate advanced genetic improvement. Genetic improvement of farmed animal species is a key means of addressing sustainable food production for the animal agriculture and aquaculture sectors.
(iii) owners of farm and companion animals and other stakeholders
In companion animals the benefits will be improved tools for selective breeding to minimise inherited diseases and inbreeding and to improve animal welfare. The utility of 'omics technology products for this purpose such as expression microarrays and SNP chips is greatly enhanced when the features on these products can be linked to a high quality annotated genome sequence and other information sources.
(iv) science infrastructure and capacity
The Ensembl genome portal for farmed and companion animals itself provides a valuable resource underpinning science infrastructure and capacity. In addition, Ensembl has developed a training programme including demonstrations, online tutorials and workshops in the use of the genome portal. This programme trains PhD students, Post Docs and research scientists to develop their skills in genome annotation, genome browsing and importantly how to interpret and understand their own data.
(v) society
The demographic of society most likely to benefit from the training opportunities the Ensembl project can provide are students who are interested in developing skills in bioinformatics. The project will benefit society more widely by providing a resource that contributes to enhancing sustainable food production.

Publications

10 25 50
 
Description A major goal of this grant is to improve both the breadth (number of species) and depth (representing different breeds/strains) of our annotation. In addition to continuing to provide access to many existing farmed and companion animal species, we have expanded our breadth by continually adding new genomes and annotations over the course of the grant. This includes providing new gene annotation, in addition to comparative genomics resources, and variation/regulatory features where data were available. We have also updated the gene annotation (and in some cases the underlying reference assembly) on several key species using the latest software and transcriptomic data. These updated species included cat, dog, horse, pig, rabbit and tilapia (as part of release 98). In release 99 we added variation displays for Atlantic Salmon, taking data direct from the European Variation Archive. In terms of expanding the depth of our annotation, as part of release 98 we became the first resource to provide extensive annotation of pig breeds. This covered 11 breeds: Hampshire, Jinhua, Berkshire, Large White, Landrace, Pietrain, Rongchang, Meishan, Tibetan, Wuzhishan and Bamei. Each breed has their own gene set, using matching transcriptomic data to the reference. A custom comparative genomics build was created including the reference, the alternative USMARC assembly and the 11 breed assemblies (in addition to outgroups such as horse and sheep). We have also provided the first annotation on dog breeds (Basenji and Great Dane, release 99) and common carp (german mirror, hebao red and huanghe).

2020 saw updated to reference assemblies including dog, Atlantic cod, denticle herring, Northern pike, sheep and turkey. In addition, we updated the gene annotation for the reference goat assembly based on the latest code and data. New species added included coho salmon, pike-perch, Yarkand-deer, spotted halibut, alpaca and gaur cattle.
We have expanded the number of farmed and companion animal breeds to include American shorthair (cat), Labrador (dog), Datong yak (wild yak) and black Bengal (goat). In addition, we have retained the previous reference sheep breed (texel), so users can still access it in addition to the new reference breed (Rambouillet).

In order to scale with the increasing number of species being sequenced and assembled, in addition to allowing us to get data out into the hands of researchers faster, we have launched Ensembl Rapid Release (https://rapid.ensembl.org/index.html). Rapid Release runs on a two-week release cycle, allowing new data to be continually pushed out. Our new species and breeds are deployed via Rapid Release. Updates to existing reference annotations on www.ensembl.org are also deployed via Rapid Release first.

For farmed and companion animal species on www.ensembl.org we have run additional comparative analyses. These include gene trees and homology classification (updated each release) and clade specific sets of multiple whole genome alignments using our Enredo, Pecan and Ortheus pipeline. For species with variation data available, we have added GERP scores to our variation pages to provide a guide to change tolerance. Our phenotype resources to use new data from OMIA and the AnimalQTL database and map phenotypes to the Vertebrate Trait Ontology, where possible, to enable improved querying. For sheep, we remapped variants annotated against the texel assembly to the Rambouillet assembly.

In 2021 we continued to update reference species to new, higher quality genome assemblies and annotations. This included updates to Atlantic cod, Atlantic salmon, carp, rainbow trout, climbing perch and turkey. We also released annotations for two hybrids of domestic cats with wild species, for the Bengal (domestic x Asian leopard cat) and Safari cat (domestic x Geoffroy's cat), along with the resolved domestic cat haplotype from the Safari cat hybrid. For all the annotations, we released via Ensembl Rapid Release, where we added homologies to a set of 40 key reference species based on the vertebrate clade the species belonged to along with other annotations such as protein domains and repeat annotation. For species such as turkey and Atlantic cod, where we had existing references on the main Ensembl website, we also updated the genomes and annotations available there. This included updates to the associated comparative datasets, such as updating the gene trees and multi-genome alignments. We re-mapped variants on the new Labrador and Boxer dog assemblies. Variant data has been added for tilapia and rabbit from the European Variation Archive.

In 2022, Ensembl 106 saw the release of new assemblies for Atlantic Salmon (Salmo salar), Rainbow Trout (Oncorhynchus mykiss), European Seabass (Dicentrarchus labrax) and Carp (Cyprinus carpio carpio), that are amongst the most commercially important aquaculture species in Europe. The reference assembly for pig was reannotated to include new publicly released tissue and developmental time point specific transcriptomic datasets, and ATAC-Seq regulatory tracks were added for the first time. The chicken assembly GRCg6a was reannotated, as well as the annotation of a broiler and layer assembly GRCg7w and GRCg7b), this coincided with a change in the chicken reference to GRCg7b following community needs. The final significant improvement for the Ensembl project was the inclusion of allele frequency data from the European Variation Archive that is now displayed on variant pages for chicken (PRJEB44919), dog (PRJEB24066) and salmon (PRJEB34225).
Exploitation Route All our data and code are available via Ensembl and GitHub, respectively. Researchers and other non-academic users are therefore able to download our data and use them for their research. As we are the only major genome browser to specialise in farmed and companion animal breed annotation, we are enabling researchers who want to move from reference-based analyses to breed specific analyses. There are currently papers under review that utilise the hybrid cattle annotations and also the pig breeds are included in a paper under review on the pig reference genome. We expect that as we develop and refine our infrastructure related to breed annotation that we will see significant utilisation of the data from both academia and industry
Sectors Agriculture, Food and Drink

URL http://www.ensembl.org/index.html
 
Description The Ensembl genome browser provides globally the most comprehensive breed-specific farmed and companion animal annotation, providing high-quality repeat, gene, protein annotations and comparative analyses for many genomes. This alone is impactful as previously research into phenotypic differences among breeds was more limited in terms of looking at variation relative to the reference gene set for a species. Now we have created gene sets directly different breeds, in addition to breed specific comparative resources, particularly for pig breeds, allowing an unprecedented window into differences across each genome. This also minimises redundancy of effort, where previously different groups interested in particular breeds may have attempted to create their own annotation. This effectively shortcuts downstream science, helping more rapid translation into socio-economic benefits. We maximise use of the resources we create by continuing to target the most relevant avenues to present our work in relation to farmed animals. In this context we have presented at ISAG and PAG (VGP annotation, FAANG workshop and pig breeds). PAG in particular represents an excellent opportunity in terms of outreach and impact; it is attended by a wide range of participants from both industry and academia, we presented in person in 2020 and virtually in 2022. Our outreach team provides workshops to groups interested in farmed animals across the globe. Our analysis of data access and visits to the Ensembl website show that the data we have made available is highly used: In 2022, the views for key species were as follows: Duck - 20117, Cow - 283308, Dog - 106017, Horse - 76305, Cod - 7468, chicken - 224961, Turkey - 13931, Tilapia - 47095, Rabbit - 59674, Sheep - 51169, Rat - 308179, Pig - 312745, Goat - 52637.
First Year Of Impact 2019
Sector Agriculture, Food and Drink
Impact Types Economic

 
Description Ensembl in a new era - deep genome annotation of domesticated animal species and breeds
Amount £419,170 (GBP)
Funding ID BB/W019108/1 
Organisation Biotechnology and Biological Sciences Research Council (BBSRC) 
Sector Public
Country United Kingdom
Start 10/2022 
End 10/2025
 
Title New rapid deployment site 
Description In order to scale with the increasing number of species being sequenced and assembled, in addition to allowing us to get data out into the hands of researchers faster, we have launched Ensembl Rapid Release (rapid.ensembl.org). Rapid Release runs on a two-week release cycle, allowing new data to be continually pushed out. Our new species and breeds are deployed via Rapid Release. Updates to existing reference annotations on www.ensembl.org are also deployed via Rapid Release first. 
Type Of Material Improvements to research infrastructure 
Year Produced 2020 
Provided To Others? Yes  
Impact Rapid release runs on a two-week release cycle and allows data to be out in the hands of researchers faster. 
URL https://rapid.ensembl.org/index.html
 
Title Addition of annotations of commercially important aquaculture species 
Description We have released new assemblies for Atlantic Salmon (Salmo salar), Rainbow Trout (Oncorhynchus mykiss), European Seabass (Dicentrarchus labrax) and Carp (Cyprinus carpio carpio), that are amongst the most commercially important aquaculture species in Europe. 
Type Of Material Database/Collection of data 
Year Produced 2022 
Provided To Others? Yes  
Impact Expanded access for users to key aquaculture species. Important for precision breeding in species of economic and environmental importance 
 
Title Addition of more non-reference breeds 
Description We have expanded the number of farmed and companion animal breeds to include American shorthair (cat), Labrador (dog), an updated boxer (dog) genome, Datong yak (wild yak) and black Bengal (goat). In addition, we have retained the previous reference sheep breed (texel), so users can still access it in addition to the new reference breed (Rambouillet). 
Type Of Material Data analysis technique 
Year Produced 2020 
Provided To Others? Yes  
Impact Expanded access for users to a number of additional farmed and companion animal breeds 
 
Title Ensembl farmed and companion animal databases 
Description Ensembl provides gene annotation, comparative genomics resources, variation (where available) and regulation (where available) for many farmed and companion animal species. This includes breed/strain resources where the data are available. Farmed and companion animal species/breeds/strains that continue to be supported through Ensembl (with gene set and assembly annotations when appropriate), but were originally made available prior to this award (first provided in 2006) are: American bison Alpaca Cat Cod Cow Dog Donkey Goat Horse Hybrid cattle - Bos indicus maternal haplotype Hybrid cattle - Bos taurus paternal haplotype Pig reference Pig USMARC crossbreed Sheep Tilapia Wild yak Species/breeds/strains that have been added to Ensembl for the first time during this grant (2019) are: Alpaca Atlantic herring Atlantic salmon Arabian camel Cat breed - Bengal Cat breed - Safari Cat breed - American shorthair Coho salmon Common carp strain - German mirror Common carp strain - Hebao red Common carp strain - Huanghe Dog breed - Basenji Dog breed - German shepherd Dog breed - Great Dane Dog breed - Labrador Domestic yak European seabass Gaur cattle Pig breed - Hampshire Pig breed - Jinhua Pig breed - Berkshire Pig breed - Large White Pig breed - Landrace Pig breed - Pietrain Pig breed - Rongchang Pig breed - Meishan Pig breed - Tibetan Pig breed - Wuzhishan Pig breed - Bamei Pike-perch Rainbow trout Spotted halibut Turbot Yarkand-deer 
Type Of Material Database/Collection of data 
Year Produced 2019 
Provided To Others? Yes  
Impact We have an ever growing collection of high quality farmed and companion animal genomes, annotations and other downstream analyses. We are currently the only major genome browser providing deep breed-specific farmed and companion animal annotation, which includes tissue and development stage-specific annotation tracks and whole genome alignments of our collection of pig breeds. This alone is impactful as previously research into phenotypic differences among breeds was more limited in terms of looking at variation relative to the reference gene set for a species. We continue to engage heavily with the community through a variety of ongoing projects to ensure we are supporting both their current and future needs. 
URL http://www.ensembl.org
 
Title New and updated species and data types 
Description We have expanded the reference assembly for pig to include new publicly released tissue and developmental time point specific transcriptomic datasets, and ATAC-Seq regulatory tracks were added for the first time. The chicken assembly GRCg6a was reannotated, as well as the annotation of a broiler and layer assembly GRCg7w and GRCg7b). We have included allele frequency data from the European Variation Archive that is now displayed on variant pages for chicken (PRJEB44919), dog (PRJEB24066) and salmon (PRJEB34225). 
Type Of Material Database/Collection of data 
Year Produced 2022 
Provided To Others? Yes  
Impact Expanded access for users to regulatory and variation data for farmed and companion animal breeds 
 
Title New and updated species annotation 
Description We have updated our reference assemblies including dog (switching from a boxer reference to Labrador), Atlantic salmon, carp, European seabass, rainbow trout, Atlantic cod, denticle herring, donkey, horse, Northern pike, turbot, sheep and turkey. In addition, we updated the gene annotation for the reference goat assembly based on the latest code and data. New species added included coho salmon, pike-perch, Yarkand-deer, spotted halibut, alpaca and gaur cattle. 
Type Of Material Data analysis technique 
Year Produced 2020 
Provided To Others? Yes  
Impact Updated reference assemblies 
 
Description Genome Annotation Resources at the EBI 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact This workshop introduced a number of key bioinformatic resources either hosted at, or contributed to by, the European Bioinformatics Institute and made freely available to the user community.
AG gave a talk introducing the data in Ensembl and Ensembl Genomes, highlighting key displays in the browser websites and demonstrating the use of two tools: BioMart to export custom data sets and the VEP for users to analyse their own variation data. Together, these resources cover more than 230 vertebrates, including livestock such as cow, pig, sheep, goat, turkey and chicken and the talk focused on agriculturally important plants and animal species.
TH gave a talk on the new infrastructure created to allow manual annotation to be added to any Ensembl species, with a focus on farmed animal species.
Year(s) Of Engagement Activity 2020
 
Description Harnessing the Ag Genomics Data Torrent:?A Community-driven Discussion on Best Practices for Using and Reusing Genomics Data 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact This workshop explored current solutions to data visualization and reuse in several genomics communities, as well as discussing as a group the paths to nurture a sustainable set of such tools. The discussion included improving, standardizing and maintaining inputs (data and metadata), as well as creating, enhancing and supporting tools for mining such data. Following the presentations, a roundtable discussion was held on what is next and how to organize to best create data reuse tools. The most important impact was that it led to the launch of an international working group on data reuse in collaboration between AG2PI (https://www.ag2pi.org/) and AgBioData (https://www.agbiodata.org/) consortia.
Year(s) Of Engagement Activity 2022
URL https://www.ag2pi.org/workshops-and-activities/community-workshop-2022-02-09/
 
Description Innovation and SME Forum: Data-Driven Innovation in the Agritech Sector 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact An exploration of data-driven innovation with a mix of high-level keynote presentations and hands-on sessions to discuss and interact with other companies, academics and ELIXIR partners. Leading towards closer integration with Elixir standards and communities.
Year(s) Of Engagement Activity 2021
URL https://elixir-europe.org/events/sme-agritech-2021
 
Description Poster at PAG 2020 - pig breeds 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact TH presented a poster on pig breeds and PAG 2020, San Diego. The conference is a forum on recent developments and future plans for plant and animal genome projects. Although the conference is primarily an academic audience, members of industry and policymakers also attend. Therefore this presentation may have had impact outside the academic sector.
Year(s) Of Engagement Activity 2020
 
Description VGP annotation 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact FM, the coordinator for vertebrate annotation, gave an update on Ensembl annotation efforts at the VGP meeting, which was part of the Plant and Animal Genome (PAG) conference, held in San Diego. The conference is attended by researchers working in plant and animal genome projects as well as members of industry and policymakers also attend. Therefore this presentation may have had impact outside the academic sector.
Year(s) Of Engagement Activity 2020