📣 Help Shape the Future of UKRI's Gateway to Research (GtR)

We're improving UKRI's Gateway to Research and are seeking your input! If you would be interested in being interviewed about the improvements we're making and to have your say about how we can make GtR more user-friendly, impactful, and effective for the Research and Innovation community, please email gateway@ukri.org.

Ensembl in a new era - deep genome annotation of domesticated animal species and breeds

Lead Research Organisation: European Bioinformatics Institute
Department Name: Genome Assembly and Annotation

Abstract

Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.

Technical Summary

The Ensembl genome browser is a widely used web-based interface that makes deeply annotated reference genomes for domesticated animals available in a unified way to researchers. An explosion in the number of genomes produced for domesticated animals is expected in the coming three years. In this proposal we describe how we will ensure that the Ensembl genome browser can keep pace to provide deep annotation of these genomes.

Populations of domesticated animals are diverse, including many different breeds and populations within each species. Advances in sequencing technologies means that the recent rise in the number of assembled genomes for domesticated animal species is expected to continue and accelerate. However:

- Current Ensembl resources are primarily focused around individual reference genomes for a single or a small number of representatives per species.
- New ways of storing, comparing, annotating, visualising and making available the diversity of genomes for each domesticated animal species are urgently required.
- Support for efforts to annotate this wealth of genome sequence data in a timely manner is critical to realising the potential impact of these data.

The overarching aim of this proposal is to establish and maintain deeply annotated genomes for domesticated animal species in the Ensembl genome browser. To achieve this aim we will:

- Analyse and annotate domesticated animal genomes as they become available, including alternate assemblies, exploiting the growing volumes of functional data.
- Run comparative genomics analyses both between species and within species.
- Acquire data from re-sequencing projects to characterise genetic variation within species and annotate variants by genomic region.

To ensure that the research community can make the most efficient use of the resource we will provide training and ensure we regularly adjust our priorities based on user feedback.

Publications

10 25 50
publication icon
Dyer SC (2025) Ensembl 2025. in Nucleic acids research

publication icon
Harrison PW (2024) Ensembl 2024. in Nucleic acids research

publication icon
Martin FJ (2023) Ensembl 2023. in Nucleic acids research

 
Description A raw DNA sequence representing an individual genome is of little use without finding and highlighting important genomic features such as the genes and the parts of the genes that encode proteins. The process of finding these features is called genome annotation and is a difficult and computationally expensive process. The major achievement of this grant is that many species of socio-economic importance, particularly in regard to food security have been annotated to a high level of quality and release through the Ensembl Genome Browser, which is a very popular platform of viewing and analysing genomes. While the species represented in Ensembl have expanded, of more importance is the provision of multiple annotated genomes per species. For example as part of the work, instead of providing a single reference sheep genome, we now provide over 20 sheep genomes, representing different breeds of sheep. We have done this for several species, where we are now representing not just the genome, but the pangenome. Pangenomes are very important to the future of things like precision breeding, where the aim is to improve a particular trait (or traits). Pangenomes show in fine detail the commonalities and differences between the genomes of difference breeds, or even different individuals within a breed, allowing for significantly more powerful exploration of what drives different traits, diseases and characteristics. To assist with futher exploring these differences, we have also produced pangenome alignment graphs for some species, where we attempt to model the variation between the different genomes in the form of an alignment graph. We provide these difficult and computationally intensive analyses, which have been carried out in a standardised way, to a high standard and to help accelerate downstream science.
Exploitation Route Much of the work on this award has been to make genome annotations and associated data available to enable downstream research. An annotated genome is essential to translating analyses on the genome into real effects on things like yield and disease resistance. Our focus on supporting pangenomes in particular will be of use in precision breeding. Pangenome annotation resources allow researches to look at variation in gene structure and expression between different breeds or even individual organisms in a level of detail that was not previously possible. We now provide these resources for a large variety of farmed and companion animals, with sheep and pig in particular having extensive pangenomic data available through Ensembl.
Sectors Agriculture

Food and Drink

Healthcare

Pharmaceuticals and Medical Biotechnology

 
Description Annotated genomes and associated comparative analyses are the main output of this work. We have released many livestock and companion animal genomes through the Ensembl Genome Browser over the durationion of the award, particularly genomes for different breeds of animal, which are key to translating to precision breeding. The Ensembl Genome Browser is used by hundreds of thousand of unique users each year, including many companies. We know from the requests to add commercial breeds of livestock and companion animal species that there is usage of these genomes and their annotations in both an academic and industrial context. From conferences such as PAG, we have recieve significant positive feedback from both academia and industry on the resources available in Ensembl that have been released as a direct result of this funding.
First Year Of Impact 2022
Sector Agriculture, Food and Drink,Environment
Impact Types Economic

 
Title Addition of annotations of commercially important aquaculture species 
Description We have released new assemblies for Atlantic Salmon (Salmo salar), Rainbow Trout (Oncorhynchus mykiss), European Seabass (Dicentrarchus labrax) and Carp (Cyprinus carpio carpio), that are amongst the most commercially important aquaculture species in Europe. 
Type Of Material Database/Collection of data 
Year Produced 2022 
Provided To Others? Yes  
Impact Expanded access for users to key aquaculture species. Important for precision breeding in species of economic and environmental importance 
 
Title Ensembl Variant Effect Predictor (VEP) Farmed Animal Annotation Updates 
Description Over the course of the grant, we have collaborated with the European Variation Archive (EVA) to synchronise our supported assemblies where possible and have improved our methods for identifying compatible variant data and importing it to Ensembl. In addition to updates to key species including cow, pig, horse, chicken and sheep, we have recently made variation data available for 4 new farmed/food species: domestic yak, greater amberjack, mallard and rainbow trout. As EVA currently supports one assembly per species, we currently remap variant data to secondary assemblies used in the community. We now annotate pig and chicken variants which lie in regulatory elements and display these data on variant specific pages. We have updated Ensembl VEP to annotate user-input variants with respect to these regulatory elements. This option is currently available via the REST and command line interfaces and will be available via the web interface in the next Ensembl release for pig, chicken, turbot and European seabass. To aid interpretation of missense variants, we calculate SIFT scores for all transcripts in cow, pig, horse, chicken, goat, sheep, cat and dog. We now also display CADD scores for pig variants, and in the next Ensembl release these will also be available via Ensembl VEP. We display population allele frequencies for variants from public sources and in the next release the Ensembl VEP web tool will report frequencies from sheep,goat, dog and chicken population studies We have also continued to import the latest phenotype association data from OMIA and AnimalQTL for each Ensembl release as well as citations mentioning RefSNP variant identifiers, as mined from the literature by EuropePMC. An example: http://www.ensembl.org/Gallus_gallus/Variation/Mappings?db=core;r=1:33025-34025;v=rs3387277637;vdb=variation;vf=16911667 
Type Of Material Database/Collection of data 
Year Produced 2023 
Provided To Others? Yes  
Impact New, improved assemblies have been generated for many key livestock species since large- scale variant calling efforts completed, leaving gaps in variant coverage over newer regions. We are piloting variant calling against the latest pig genome, with the aim of identifying novel variants in the novel regions. Once this is successful, we will extend to other species providing a more complete view of genomic variation. 
URL http://www.ensembl.org/
 
Title Improved Ensembl annotations and annotation of additional breeds 
Description CpG islands added to chicken, pig and horse reference annotations in Ensembl. Re-annotation with new transcriptomic data for Horse (EquCab3.0 - GCA_002863925.1) Annotation of breeds for chicken (2), sheep (18), pig (8), goat (2), buffalo (1), warthog (1) 
Type Of Material Database/Collection of data 
Year Produced 2023 
Provided To Others? Yes  
Impact Improved Ensembl annotation available to the research community to improve and accelerate an array of downstream scientific discoveries in these species 
URL https://www.ensembl.org/
 
Title Manual annotation of key pig/cattle genes 
Description We completed the manual annotation of the SLA in pig and the MHC's in cattle and sheep. This work was based on discussions regarding the annotation of OBSCN gene in pig with Dutch and Norwegian pig researchers who are keen on using the new pig T2T genome and are generating new long read transcriptomic data from diverse tissues. These regions are difficult to automatically annotate, manual annotation led to a significant improvement in the regions covered 
Type Of Material Database/Collection of data 
Year Produced 2024 
Provided To Others? Yes  
Impact MHC genes are immune genes, the SLA locus is a key regulator of signalling and development. By improving the annotation we enable more accurate downstream research on genes that are relevant to both disease in livestock and also can be used when researching the equivalent pathways in human 
URL https://beta.ensembl.org
 
Title New Ensembl reference assembly/annotations for chicken, cow and donkey 
Description New Ensembl reference assembly/annotations for chicken (ARS-UI_Ramb_v2.0 - GCA_016772045.1), cow (ARS-UCD1.3 - GCA_002263795.3), Donkey (ASM1607732v2 - GCA_016077325.2). All made available through Ensembl. 
Type Of Material Database/Collection of data 
Year Produced 2023 
Provided To Others? Yes  
Impact New reference annotations mark a step change for these communities enabling improved downstream analyses that require a genomic context. 
URL https://www.ensembl.org/
 
Title New and updated genome annotations 
Description We have annotated several new assemblies from different species, including pig, sheep, cattle, bison, horse, donkey, and some camelids. These assemblies include different breeds, some T2T genomes, and important updates to the references (for cattle and sheep, specifically). We also annotated the new cat reference genome, making the old one available as a breed. We worked on the release coordination to publish the result of almost 20 pigs and around 10 sheep breeds. As well as other past annotations, including some commercial fish. We worked on several fixes to sheep and goat to improve usability of its data in our browser. 
Type Of Material Database/Collection of data 
Year Produced 2024 
Provided To Others? Yes  
Impact Annotated genomes are key to downstream research of livestock and companion animal species for both academia and industry 
URL https://beta.ensembl.org
 
Title New and updated species and data types 
Description We have expanded the reference assembly for pig to include new publicly released tissue and developmental time point specific transcriptomic datasets, and ATAC-Seq regulatory tracks were added for the first time. The chicken assembly GRCg6a was reannotated, as well as the annotation of a broiler and layer assembly GRCg7w and GRCg7b). We have included allele frequency data from the European Variation Archive that is now displayed on variant pages for chicken (PRJEB44919), dog (PRJEB24066) and salmon (PRJEB34225). 
Type Of Material Database/Collection of data 
Year Produced 2022 
Provided To Others? Yes  
Impact Expanded access for users to regulatory and variation data for farmed and companion animal breeds 
 
Title Regulatory annotation for cattle and chicken 
Description During 2024, we processed functional genomics data from Kern et al (2021) to produce our first regulatory annotation for cow (Bos taurus), which was released in Ensembl 113. The annotation includes promoters, enhancers and open chromatin regions. We also processed additional data for chicken (Gallus gallus) from ENA Study PRJEB55656 to extend our regulatory annotation to further epigenomes. Finally, we have continued to refine our farmed animal regulatory annotation over the past year to keep it consistent and aligned with the improvements we have made to our regulatory build for human and mouse. 
Type Of Material Database/Collection of data 
Year Produced 2024 
Provided To Others? Yes  
Impact Regulatory information is key to understanding how the function elements of cells and tissues interact with one another. It is particularly important for giving context for what genes are and are not expressed in different conditions and can help understand how the expression of a particular gene correlates with or alters the expression of other genes 
URL https://www.ensembl.org
 
Title Variation annotation for chicken, pig, cattle, sheep, Atlantic salmon 
Description We added support for regulatory feature variant consequences on the main Ensembl website and using Ensembl VEP for pig, chicken, cattle and Atlantic salmon in release e113. We imported latest releases of variation data from the EVA for pig, cow, chicken and sheep along with associated phenotype updates from the OMIA and AnimalQTL databases. We carried out variant calling in pig using an adapted version of the SAREK pipeline (https://nf-co.re/sarek/3.5.1), and reads from pig variant strains on ENA, to address variant deserts. Results will also be uploaded to EVA. 
Type Of Material Database/Collection of data 
Year Produced 2024 
Provided To Others? Yes  
Impact Variation data is crucial for understanding differences within a species, either at the breed, population or individual level. By providing variation resources for these species in a standardised manner, it is possible for downstream researchers to analyse the genome annotations in the context of breed or population specific genomic variation, which can then be tracked back to phenotypic variation and help for selection for traits such as increased yield or disease resistance. 
URL https://www.ensembl.org
 
Description Unveiling intriguing diversity of African pigs and wild suids through epigenetics 
Organisation University of Evora
Country Portugal 
Sector Academic/University 
PI Contribution Advice on establishing reference genome sequences based on long-read sequencing data.
Collaborator Contribution Leadership of the project, acquisition of samples and data generation.
Impact No data as yet. Funding application submitted.
Start Year 2024
 
Description AQUA-FAANG Final Conference Panel Discussion 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact On the final day there was a panel discussion focused on functional genomics and future perspectives for the aquaculture sector in which Peter Harrison (EMBL-EBI), Garth Ilsley (EMBL-EBI), Gabriela Merino (EMBL-EBI) and Emily Clark (Roslin Institute) paerticipated. In the discussion accessibility and usability of functional annotation information was discussed and its usefulness for genomic selection as well as the route to application of the data particularly in the context of genome editing. The panel discussion provided very useful feedback for development and priorities for the Ensembl Genome Browser. Audience members asked many questions of the panel and plans were made for future related activity.
Year(s) Of Engagement Activity 2023
URL https://www.aqua-faang.eu/final-conference.html
 
Description AQUA-FAANG Final conference, AQUA-FAANG relevance to industry session, Peter Harrison talk on Ensembl gene annotation, regulation and variant effect prediction for aquaculture 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Industry/Business
Results and Impact At the Horizon 2020 AQUA-FAANG project final conference as part of the AQUA-FAANG relevance to industry session on the industry engagement day of the conference, Peter Harrison gave a talk on Ensembl gene annotation, regulation and variant effect prediction for aquaculture. The audience asked questions, in particular about Ensembls Variant Effect Predictor tooling.
Year(s) Of Engagement Activity 2023
URL https://www.aqua-faang.eu/final-conference.html
 
Description BovReg Final Conference - Peter Harrison gave a talk on EuroFAANG Data Infrastructure: standardizing and presenting BovReg and, global FAANG, data 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact At the BovReg Final Conference Peter Harrison gave a talk on EuroFAANG Data Infrastructure: standardizing and presenting BovReg and, global FAANG data. This included the Ensembl cattle annotation, and the need for improved gene and regulatory annotation to be improved in the coming years.
Year(s) Of Engagement Activity 2024
URL https://bovreg.eu/bovreg-final-conference/
 
Description BovReg Final Confernece - Future of Functional Annotation beyond BovReg panel discussion 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact During the final conference for the Horizon 2020 project BovReg, Peter Harrison contributed to panel discussions in the Future of Functional Annotation beyond BovReg. The discussion included the future annotation and regulatory builds for cattle, and how this award could support those efforts going forward to ensure this becomes available to the community through Ensembl.
Year(s) Of Engagement Activity 2024
URL https://bovreg.eu/bovreg-final-conference/
 
Description ISAG 2023 - FAANG workshop Panel discussion 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact At the International Society for Animal Genetics Conference in Cape Town Garth Ilsley (EMBL-EBI) and Emily Clark (Roslin) were involved an open discussion on the implementation of the next phase of FAANG. The discussion included efforts to make functional annotation more accessible including to other spaces such as industry and animal breeders, and whether it was possible to out source annotation efforts to the community. The discussion was very relevant to the development of priorities for the Ensembl Genome Browser for farmed and domestic animals, particularly that the community saw additional regulatory builds for more species as a priority.
Year(s) Of Engagement Activity 2023
URL https://www.isag.us/Docs/Proceedings/ISAG_2023_Abstracts.pdf
 
Description PAG 31 - Workshop on Workshop: Genome Annotation Resources at the EBI Adam Frankish (on behalf of Jane Loveland) gave a talk on Vertebrate Genomes in Ensembl 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact At the PAG 31 conference in the Workshop on Genome Annotation Resources at the EBI Adam Frankish gave a talk on Vertebrate Genomes in Ensembl. This was given on behalf of Jane Loveland who could not attend for personal reasons. The talk covered Ensembl (www.ensembl.org) infrastructure for accessing genomic information covering over 300 vertebrate species, including cattle, pig, sheep, horse and chicken. We generate automatic, evidence-based genome annotation from multiple lines of evidence. The audience asked questions and requested more information on how to access these resources.
Year(s) Of Engagement Activity 2024
URL https://pag.confex.com/pag/31/meetingapp.cgi/Paper/52768
 
Description PAG31 - Panel Discussion Implementing the Next Phase of FAANG 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact At the global FAANG workshop at the Plant and Animal Genomes conference (PAG31) Emily Clark (Roslin Institute) and Peter Harrison (EMBL-EBI) were involved an open discussion on the implementation of the next phase of FAANG. The discussion included the task forces and efforts to make functional annotation more accessible including to other spaces such as industry and animal breeders. The discussion was very relevant to the development of priorities for the Ensembl Genome Browser for farmed and domestic animals.
Year(s) Of Engagement Activity 2024
URL https://plan.core-apps.com/pag_2024/abstract/a6603333-8741-4c5d-a45f-f5d49ee1d01c