Ensembl in a new era - deep genome annotation of domesticated animal species and breeds
Lead Research Organisation:
European Bioinformatics Institute
Department Name: Genome Assembly and Annotation
Abstract
Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.
Technical Summary
The Ensembl genome browser is a widely used web-based interface that makes deeply annotated reference genomes for domesticated animals available in a unified way to researchers. An explosion in the number of genomes produced for domesticated animals is expected in the coming three years. In this proposal we describe how we will ensure that the Ensembl genome browser can keep pace to provide deep annotation of these genomes.
Populations of domesticated animals are diverse, including many different breeds and populations within each species. Advances in sequencing technologies means that the recent rise in the number of assembled genomes for domesticated animal species is expected to continue and accelerate. However:
- Current Ensembl resources are primarily focused around individual reference genomes for a single or a small number of representatives per species.
- New ways of storing, comparing, annotating, visualising and making available the diversity of genomes for each domesticated animal species are urgently required.
- Support for efforts to annotate this wealth of genome sequence data in a timely manner is critical to realising the potential impact of these data.
The overarching aim of this proposal is to establish and maintain deeply annotated genomes for domesticated animal species in the Ensembl genome browser. To achieve this aim we will:
- Analyse and annotate domesticated animal genomes as they become available, including alternate assemblies, exploiting the growing volumes of functional data.
- Run comparative genomics analyses both between species and within species.
- Acquire data from re-sequencing projects to characterise genetic variation within species and annotate variants by genomic region.
To ensure that the research community can make the most efficient use of the resource we will provide training and ensure we regularly adjust our priorities based on user feedback.
Populations of domesticated animals are diverse, including many different breeds and populations within each species. Advances in sequencing technologies means that the recent rise in the number of assembled genomes for domesticated animal species is expected to continue and accelerate. However:
- Current Ensembl resources are primarily focused around individual reference genomes for a single or a small number of representatives per species.
- New ways of storing, comparing, annotating, visualising and making available the diversity of genomes for each domesticated animal species are urgently required.
- Support for efforts to annotate this wealth of genome sequence data in a timely manner is critical to realising the potential impact of these data.
The overarching aim of this proposal is to establish and maintain deeply annotated genomes for domesticated animal species in the Ensembl genome browser. To achieve this aim we will:
- Analyse and annotate domesticated animal genomes as they become available, including alternate assemblies, exploiting the growing volumes of functional data.
- Run comparative genomics analyses both between species and within species.
- Acquire data from re-sequencing projects to characterise genetic variation within species and annotate variants by genomic region.
To ensure that the research community can make the most efficient use of the resource we will provide training and ensure we regularly adjust our priorities based on user feedback.
Publications
Dyer SC
(2025)
Ensembl 2025.
in Nucleic acids research
Harrison PW
(2024)
Ensembl 2024.
in Nucleic acids research
Martin FJ
(2023)
Ensembl 2023.
in Nucleic acids research
| Description | A raw DNA sequence representing an individual genome is of little use without finding and highlighting important genomic features such as the genes and the parts of the genes that encode proteins. The process of finding these features is called genome annotation and is a difficult and computationally expensive process. The major achievement of this grant is that many species of socio-economic importance, particularly in regard to food security have been annotated to a high level of quality and release through the Ensembl Genome Browser, which is a very popular platform of viewing and analysing genomes. While the species represented in Ensembl have expanded, of more importance is the provision of multiple annotated genomes per species. For example as part of the work, instead of providing a single reference sheep genome, we now provide over 20 sheep genomes, representing different breeds of sheep. We have done this for several species, where we are now representing not just the genome, but the pangenome. Pangenomes are very important to the future of things like precision breeding, where the aim is to improve a particular trait (or traits). Pangenomes show in fine detail the commonalities and differences between the genomes of difference breeds, or even different individuals within a breed, allowing for significantly more powerful exploration of what drives different traits, diseases and characteristics. To assist with futher exploring these differences, we have also produced pangenome alignment graphs for some species, where we attempt to model the variation between the different genomes in the form of an alignment graph. We provide these difficult and computationally intensive analyses, which have been carried out in a standardised way, to a high standard and to help accelerate downstream science. |
| Exploitation Route | Much of the work on this award has been to make genome annotations and associated data available to enable downstream research. An annotated genome is essential to translating analyses on the genome into real effects on things like yield and disease resistance. Our focus on supporting pangenomes in particular will be of use in precision breeding. Pangenome annotation resources allow researches to look at variation in gene structure and expression between different breeds or even individual organisms in a level of detail that was not previously possible. We now provide these resources for a large variety of farmed and companion animals, with sheep and pig in particular having extensive pangenomic data available through Ensembl. |
| Sectors | Agriculture Food and Drink Healthcare Pharmaceuticals and Medical Biotechnology |
| Description | Annotated genomes and associated comparative analyses are the main output of this work. We have released many livestock and companion animal genomes through the Ensembl Genome Browser over the durationion of the award, particularly genomes for different breeds of animal, which are key to translating to precision breeding. The Ensembl Genome Browser is used by hundreds of thousand of unique users each year, including many companies. We know from the requests to add commercial breeds of livestock and companion animal species that there is usage of these genomes and their annotations in both an academic and industrial context. From conferences such as PAG, we have recieve significant positive feedback from both academia and industry on the resources available in Ensembl that have been released as a direct result of this funding. |
| First Year Of Impact | 2022 |
| Sector | Agriculture, Food and Drink,Environment |
| Impact Types | Economic |
| Title | Addition of annotations of commercially important aquaculture species |
| Description | We have released new assemblies for Atlantic Salmon (Salmo salar), Rainbow Trout (Oncorhynchus mykiss), European Seabass (Dicentrarchus labrax) and Carp (Cyprinus carpio carpio), that are amongst the most commercially important aquaculture species in Europe. |
| Type Of Material | Database/Collection of data |
| Year Produced | 2022 |
| Provided To Others? | Yes |
| Impact | Expanded access for users to key aquaculture species. Important for precision breeding in species of economic and environmental importance |
| Title | Ensembl Variant Effect Predictor (VEP) Farmed Animal Annotation Updates |
| Description | Over the course of the grant, we have collaborated with the European Variation Archive (EVA) to synchronise our supported assemblies where possible and have improved our methods for identifying compatible variant data and importing it to Ensembl. In addition to updates to key species including cow, pig, horse, chicken and sheep, we have recently made variation data available for 4 new farmed/food species: domestic yak, greater amberjack, mallard and rainbow trout. As EVA currently supports one assembly per species, we currently remap variant data to secondary assemblies used in the community. We now annotate pig and chicken variants which lie in regulatory elements and display these data on variant specific pages. We have updated Ensembl VEP to annotate user-input variants with respect to these regulatory elements. This option is currently available via the REST and command line interfaces and will be available via the web interface in the next Ensembl release for pig, chicken, turbot and European seabass. To aid interpretation of missense variants, we calculate SIFT scores for all transcripts in cow, pig, horse, chicken, goat, sheep, cat and dog. We now also display CADD scores for pig variants, and in the next Ensembl release these will also be available via Ensembl VEP. We display population allele frequencies for variants from public sources and in the next release the Ensembl VEP web tool will report frequencies from sheep,goat, dog and chicken population studies We have also continued to import the latest phenotype association data from OMIA and AnimalQTL for each Ensembl release as well as citations mentioning RefSNP variant identifiers, as mined from the literature by EuropePMC. An example: http://www.ensembl.org/Gallus_gallus/Variation/Mappings?db=core;r=1:33025-34025;v=rs3387277637;vdb=variation;vf=16911667 |
| Type Of Material | Database/Collection of data |
| Year Produced | 2023 |
| Provided To Others? | Yes |
| Impact | New, improved assemblies have been generated for many key livestock species since large- scale variant calling efforts completed, leaving gaps in variant coverage over newer regions. We are piloting variant calling against the latest pig genome, with the aim of identifying novel variants in the novel regions. Once this is successful, we will extend to other species providing a more complete view of genomic variation. |
| URL | http://www.ensembl.org/ |
| Title | Improved Ensembl annotations and annotation of additional breeds |
| Description | CpG islands added to chicken, pig and horse reference annotations in Ensembl. Re-annotation with new transcriptomic data for Horse (EquCab3.0 - GCA_002863925.1) Annotation of breeds for chicken (2), sheep (18), pig (8), goat (2), buffalo (1), warthog (1) |
| Type Of Material | Database/Collection of data |
| Year Produced | 2023 |
| Provided To Others? | Yes |
| Impact | Improved Ensembl annotation available to the research community to improve and accelerate an array of downstream scientific discoveries in these species |
| URL | https://www.ensembl.org/ |
| Title | Manual annotation of key pig/cattle genes |
| Description | We completed the manual annotation of the SLA in pig and the MHC's in cattle and sheep. This work was based on discussions regarding the annotation of OBSCN gene in pig with Dutch and Norwegian pig researchers who are keen on using the new pig T2T genome and are generating new long read transcriptomic data from diverse tissues. These regions are difficult to automatically annotate, manual annotation led to a significant improvement in the regions covered |
| Type Of Material | Database/Collection of data |
| Year Produced | 2024 |
| Provided To Others? | Yes |
| Impact | MHC genes are immune genes, the SLA locus is a key regulator of signalling and development. By improving the annotation we enable more accurate downstream research on genes that are relevant to both disease in livestock and also can be used when researching the equivalent pathways in human |
| URL | https://beta.ensembl.org |
| Title | New Ensembl reference assembly/annotations for chicken, cow and donkey |
| Description | New Ensembl reference assembly/annotations for chicken (ARS-UI_Ramb_v2.0 - GCA_016772045.1), cow (ARS-UCD1.3 - GCA_002263795.3), Donkey (ASM1607732v2 - GCA_016077325.2). All made available through Ensembl. |
| Type Of Material | Database/Collection of data |
| Year Produced | 2023 |
| Provided To Others? | Yes |
| Impact | New reference annotations mark a step change for these communities enabling improved downstream analyses that require a genomic context. |
| URL | https://www.ensembl.org/ |
| Title | New and updated genome annotations |
| Description | We have annotated several new assemblies from different species, including pig, sheep, cattle, bison, horse, donkey, and some camelids. These assemblies include different breeds, some T2T genomes, and important updates to the references (for cattle and sheep, specifically). We also annotated the new cat reference genome, making the old one available as a breed. We worked on the release coordination to publish the result of almost 20 pigs and around 10 sheep breeds. As well as other past annotations, including some commercial fish. We worked on several fixes to sheep and goat to improve usability of its data in our browser. |
| Type Of Material | Database/Collection of data |
| Year Produced | 2024 |
| Provided To Others? | Yes |
| Impact | Annotated genomes are key to downstream research of livestock and companion animal species for both academia and industry |
| URL | https://beta.ensembl.org |
| Title | New and updated species and data types |
| Description | We have expanded the reference assembly for pig to include new publicly released tissue and developmental time point specific transcriptomic datasets, and ATAC-Seq regulatory tracks were added for the first time. The chicken assembly GRCg6a was reannotated, as well as the annotation of a broiler and layer assembly GRCg7w and GRCg7b). We have included allele frequency data from the European Variation Archive that is now displayed on variant pages for chicken (PRJEB44919), dog (PRJEB24066) and salmon (PRJEB34225). |
| Type Of Material | Database/Collection of data |
| Year Produced | 2022 |
| Provided To Others? | Yes |
| Impact | Expanded access for users to regulatory and variation data for farmed and companion animal breeds |
| Title | Regulatory annotation for cattle and chicken |
| Description | During 2024, we processed functional genomics data from Kern et al (2021) to produce our first regulatory annotation for cow (Bos taurus), which was released in Ensembl 113. The annotation includes promoters, enhancers and open chromatin regions. We also processed additional data for chicken (Gallus gallus) from ENA Study PRJEB55656 to extend our regulatory annotation to further epigenomes. Finally, we have continued to refine our farmed animal regulatory annotation over the past year to keep it consistent and aligned with the improvements we have made to our regulatory build for human and mouse. |
| Type Of Material | Database/Collection of data |
| Year Produced | 2024 |
| Provided To Others? | Yes |
| Impact | Regulatory information is key to understanding how the function elements of cells and tissues interact with one another. It is particularly important for giving context for what genes are and are not expressed in different conditions and can help understand how the expression of a particular gene correlates with or alters the expression of other genes |
| URL | https://www.ensembl.org |
| Title | Variation annotation for chicken, pig, cattle, sheep, Atlantic salmon |
| Description | We added support for regulatory feature variant consequences on the main Ensembl website and using Ensembl VEP for pig, chicken, cattle and Atlantic salmon in release e113. We imported latest releases of variation data from the EVA for pig, cow, chicken and sheep along with associated phenotype updates from the OMIA and AnimalQTL databases. We carried out variant calling in pig using an adapted version of the SAREK pipeline (https://nf-co.re/sarek/3.5.1), and reads from pig variant strains on ENA, to address variant deserts. Results will also be uploaded to EVA. |
| Type Of Material | Database/Collection of data |
| Year Produced | 2024 |
| Provided To Others? | Yes |
| Impact | Variation data is crucial for understanding differences within a species, either at the breed, population or individual level. By providing variation resources for these species in a standardised manner, it is possible for downstream researchers to analyse the genome annotations in the context of breed or population specific genomic variation, which can then be tracked back to phenotypic variation and help for selection for traits such as increased yield or disease resistance. |
| URL | https://www.ensembl.org |
| Description | Unveiling intriguing diversity of African pigs and wild suids through epigenetics |
| Organisation | University of Evora |
| Country | Portugal |
| Sector | Academic/University |
| PI Contribution | Advice on establishing reference genome sequences based on long-read sequencing data. |
| Collaborator Contribution | Leadership of the project, acquisition of samples and data generation. |
| Impact | No data as yet. Funding application submitted. |
| Start Year | 2024 |
| Description | AQUA-FAANG Final Conference Panel Discussion |
| Form Of Engagement Activity | A formal working group, expert panel or dialogue |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Other audiences |
| Results and Impact | On the final day there was a panel discussion focused on functional genomics and future perspectives for the aquaculture sector in which Peter Harrison (EMBL-EBI), Garth Ilsley (EMBL-EBI), Gabriela Merino (EMBL-EBI) and Emily Clark (Roslin Institute) paerticipated. In the discussion accessibility and usability of functional annotation information was discussed and its usefulness for genomic selection as well as the route to application of the data particularly in the context of genome editing. The panel discussion provided very useful feedback for development and priorities for the Ensembl Genome Browser. Audience members asked many questions of the panel and plans were made for future related activity. |
| Year(s) Of Engagement Activity | 2023 |
| URL | https://www.aqua-faang.eu/final-conference.html |
| Description | AQUA-FAANG Final conference, AQUA-FAANG relevance to industry session, Peter Harrison talk on Ensembl gene annotation, regulation and variant effect prediction for aquaculture |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Industry/Business |
| Results and Impact | At the Horizon 2020 AQUA-FAANG project final conference as part of the AQUA-FAANG relevance to industry session on the industry engagement day of the conference, Peter Harrison gave a talk on Ensembl gene annotation, regulation and variant effect prediction for aquaculture. The audience asked questions, in particular about Ensembls Variant Effect Predictor tooling. |
| Year(s) Of Engagement Activity | 2023 |
| URL | https://www.aqua-faang.eu/final-conference.html |
| Description | BovReg Final Conference - Peter Harrison gave a talk on EuroFAANG Data Infrastructure: standardizing and presenting BovReg and, global FAANG, data |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Other audiences |
| Results and Impact | At the BovReg Final Conference Peter Harrison gave a talk on EuroFAANG Data Infrastructure: standardizing and presenting BovReg and, global FAANG data. This included the Ensembl cattle annotation, and the need for improved gene and regulatory annotation to be improved in the coming years. |
| Year(s) Of Engagement Activity | 2024 |
| URL | https://bovreg.eu/bovreg-final-conference/ |
| Description | BovReg Final Confernece - Future of Functional Annotation beyond BovReg panel discussion |
| Form Of Engagement Activity | A formal working group, expert panel or dialogue |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Other audiences |
| Results and Impact | During the final conference for the Horizon 2020 project BovReg, Peter Harrison contributed to panel discussions in the Future of Functional Annotation beyond BovReg. The discussion included the future annotation and regulatory builds for cattle, and how this award could support those efforts going forward to ensure this becomes available to the community through Ensembl. |
| Year(s) Of Engagement Activity | 2024 |
| URL | https://bovreg.eu/bovreg-final-conference/ |
| Description | ISAG 2023 - FAANG workshop Panel discussion |
| Form Of Engagement Activity | A formal working group, expert panel or dialogue |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Other audiences |
| Results and Impact | At the International Society for Animal Genetics Conference in Cape Town Garth Ilsley (EMBL-EBI) and Emily Clark (Roslin) were involved an open discussion on the implementation of the next phase of FAANG. The discussion included efforts to make functional annotation more accessible including to other spaces such as industry and animal breeders, and whether it was possible to out source annotation efforts to the community. The discussion was very relevant to the development of priorities for the Ensembl Genome Browser for farmed and domestic animals, particularly that the community saw additional regulatory builds for more species as a priority. |
| Year(s) Of Engagement Activity | 2023 |
| URL | https://www.isag.us/Docs/Proceedings/ISAG_2023_Abstracts.pdf |
| Description | PAG 31 - Workshop on Workshop: Genome Annotation Resources at the EBI Adam Frankish (on behalf of Jane Loveland) gave a talk on Vertebrate Genomes in Ensembl |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Other audiences |
| Results and Impact | At the PAG 31 conference in the Workshop on Genome Annotation Resources at the EBI Adam Frankish gave a talk on Vertebrate Genomes in Ensembl. This was given on behalf of Jane Loveland who could not attend for personal reasons. The talk covered Ensembl (www.ensembl.org) infrastructure for accessing genomic information covering over 300 vertebrate species, including cattle, pig, sheep, horse and chicken. We generate automatic, evidence-based genome annotation from multiple lines of evidence. The audience asked questions and requested more information on how to access these resources. |
| Year(s) Of Engagement Activity | 2024 |
| URL | https://pag.confex.com/pag/31/meetingapp.cgi/Paper/52768 |
| Description | PAG31 - Panel Discussion Implementing the Next Phase of FAANG |
| Form Of Engagement Activity | A formal working group, expert panel or dialogue |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Other audiences |
| Results and Impact | At the global FAANG workshop at the Plant and Animal Genomes conference (PAG31) Emily Clark (Roslin Institute) and Peter Harrison (EMBL-EBI) were involved an open discussion on the implementation of the next phase of FAANG. The discussion included the task forces and efforts to make functional annotation more accessible including to other spaces such as industry and animal breeders. The discussion was very relevant to the development of priorities for the Ensembl Genome Browser for farmed and domestic animals. |
| Year(s) Of Engagement Activity | 2024 |
| URL | https://plan.core-apps.com/pag_2024/abstract/a6603333-8741-4c5d-a45f-f5d49ee1d01c |