Ensembl in a new era - deep genome annotation of domesticated animal species and breeds

Lead Research Organisation: EMBL - European Bioinformatics Institute
Department Name: Genome Assembly and Annotation

Abstract

Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.

Technical Summary

The Ensembl genome browser is a widely used web-based interface that makes deeply annotated reference genomes for domesticated animals available in a unified way to researchers. An explosion in the number of genomes produced for domesticated animals is expected in the coming three years. In this proposal we describe how we will ensure that the Ensembl genome browser can keep pace to provide deep annotation of these genomes.

Populations of domesticated animals are diverse, including many different breeds and populations within each species. Advances in sequencing technologies means that the recent rise in the number of assembled genomes for domesticated animal species is expected to continue and accelerate. However:

- Current Ensembl resources are primarily focused around individual reference genomes for a single or a small number of representatives per species.
- New ways of storing, comparing, annotating, visualising and making available the diversity of genomes for each domesticated animal species are urgently required.
- Support for efforts to annotate this wealth of genome sequence data in a timely manner is critical to realising the potential impact of these data.

The overarching aim of this proposal is to establish and maintain deeply annotated genomes for domesticated animal species in the Ensembl genome browser. To achieve this aim we will:

- Analyse and annotate domesticated animal genomes as they become available, including alternate assemblies, exploiting the growing volumes of functional data.
- Run comparative genomics analyses both between species and within species.
- Acquire data from re-sequencing projects to characterise genetic variation within species and annotate variants by genomic region.

To ensure that the research community can make the most efficient use of the resource we will provide training and ensure we regularly adjust our priorities based on user feedback.

Publications

10 25 50

publication icon
Harrison PW (2024) Ensembl 2024. in Nucleic acids research

publication icon
Martin FJ (2023) Ensembl 2023. in Nucleic acids research

 
Title Addition of annotations of commercially important aquaculture species 
Description We have released new assemblies for Atlantic Salmon (Salmo salar), Rainbow Trout (Oncorhynchus mykiss), European Seabass (Dicentrarchus labrax) and Carp (Cyprinus carpio carpio), that are amongst the most commercially important aquaculture species in Europe. 
Type Of Material Database/Collection of data 
Year Produced 2022 
Provided To Others? Yes  
Impact Expanded access for users to key aquaculture species. Important for precision breeding in species of economic and environmental importance 
 
Title Ensembl Variant Effect Predictor (VEP) Farmed Animal Annotation Updates 
Description Over the course of the grant, we have collaborated with the European Variation Archive (EVA) to synchronise our supported assemblies where possible and have improved our methods for identifying compatible variant data and importing it to Ensembl. In addition to updates to key species including cow, pig, horse, chicken and sheep, we have recently made variation data available for 4 new farmed/food species: domestic yak, greater amberjack, mallard and rainbow trout. As EVA currently supports one assembly per species, we currently remap variant data to secondary assemblies used in the community. We now annotate pig and chicken variants which lie in regulatory elements and display these data on variant specific pages. We have updated Ensembl VEP to annotate user-input variants with respect to these regulatory elements. This option is currently available via the REST and command line interfaces and will be available via the web interface in the next Ensembl release for pig, chicken, turbot and European seabass. To aid interpretation of missense variants, we calculate SIFT scores for all transcripts in cow, pig, horse, chicken, goat, sheep, cat and dog. We now also display CADD scores for pig variants, and in the next Ensembl release these will also be available via Ensembl VEP. We display population allele frequencies for variants from public sources and in the next release the Ensembl VEP web tool will report frequencies from sheep,goat, dog and chicken population studies We have also continued to import the latest phenotype association data from OMIA and AnimalQTL for each Ensembl release as well as citations mentioning RefSNP variant identifiers, as mined from the literature by EuropePMC. An example: http://www.ensembl.org/Gallus_gallus/Variation/Mappings?db=core;r=1:33025-34025;v=rs3387277637;vdb=variation;vf=16911667 
Type Of Material Database/Collection of data 
Year Produced 2023 
Provided To Others? Yes  
Impact New, improved assemblies have been generated for many key livestock species since large- scale variant calling efforts completed, leaving gaps in variant coverage over newer regions. We are piloting variant calling against the latest pig genome, with the aim of identifying novel variants in the novel regions. Once this is successful, we will extend to other species providing a more complete view of genomic variation. 
URL http://www.ensembl.org/
 
Title Improved Ensembl annotations and annotation of additional breeds 
Description CpG islands added to chicken, pig and horse reference annotations in Ensembl. Re-annotation with new transcriptomic data for Horse (EquCab3.0 - GCA_002863925.1) Annotation of breeds for chicken (2), sheep (18), pig (8), goat (2), buffalo (1), warthog (1) 
Type Of Material Database/Collection of data 
Year Produced 2023 
Provided To Others? Yes  
Impact Improved Ensembl annotation available to the research community to improve and accelerate an array of downstream scientific discoveries in these species 
URL https://www.ensembl.org/
 
Title New Ensembl reference assembly/annotations for chicken, cow and donkey 
Description New Ensembl reference assembly/annotations for chicken (ARS-UI_Ramb_v2.0 - GCA_016772045.1), cow (ARS-UCD1.3 - GCA_002263795.3), Donkey (ASM1607732v2 - GCA_016077325.2). All made available through Ensembl. 
Type Of Material Database/Collection of data 
Year Produced 2023 
Provided To Others? Yes  
Impact New reference annotations mark a step change for these communities enabling improved downstream analyses that require a genomic context. 
URL https://www.ensembl.org/
 
Title New and updated species and data types 
Description We have expanded the reference assembly for pig to include new publicly released tissue and developmental time point specific transcriptomic datasets, and ATAC-Seq regulatory tracks were added for the first time. The chicken assembly GRCg6a was reannotated, as well as the annotation of a broiler and layer assembly GRCg7w and GRCg7b). We have included allele frequency data from the European Variation Archive that is now displayed on variant pages for chicken (PRJEB44919), dog (PRJEB24066) and salmon (PRJEB34225). 
Type Of Material Database/Collection of data 
Year Produced 2022 
Provided To Others? Yes  
Impact Expanded access for users to regulatory and variation data for farmed and companion animal breeds 
 
Description Unveiling intriguing diversity of African pigs and wild suids through epigenetics 
Organisation University of Evora
Country Portugal 
Sector Academic/University 
PI Contribution Advice on establishing reference genome sequences based on long-read sequencing data.
Collaborator Contribution Leadership of the project, acquisition of samples and data generation.
Impact No data as yet. Funding application submitted.
Start Year 2024
 
Description AQUA-FAANG Final Conference Panel Discussion 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact On the final day there was a panel discussion focused on functional genomics and future perspectives for the aquaculture sector in which Peter Harrison (EMBL-EBI), Garth Ilsley (EMBL-EBI), Gabriela Merino (EMBL-EBI) and Emily Clark (Roslin Institute) paerticipated. In the discussion accessibility and usability of functional annotation information was discussed and its usefulness for genomic selection as well as the route to application of the data particularly in the context of genome editing. The panel discussion provided very useful feedback for development and priorities for the Ensembl Genome Browser. Audience members asked many questions of the panel and plans were made for future related activity.
Year(s) Of Engagement Activity 2023
URL https://www.aqua-faang.eu/final-conference.html
 
Description AQUA-FAANG Final conference, AQUA-FAANG relevance to industry session, Peter Harrison talk on Ensembl gene annotation, regulation and variant effect prediction for aquaculture 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Industry/Business
Results and Impact At the Horizon 2020 AQUA-FAANG project final conference as part of the AQUA-FAANG relevance to industry session on the industry engagement day of the conference, Peter Harrison gave a talk on Ensembl gene annotation, regulation and variant effect prediction for aquaculture. The audience asked questions, in particular about Ensembls Variant Effect Predictor tooling.
Year(s) Of Engagement Activity 2023
URL https://www.aqua-faang.eu/final-conference.html
 
Description BovReg Final Conference - Peter Harrison gave a talk on EuroFAANG Data Infrastructure: standardizing and presenting BovReg and, global FAANG, data 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact At the BovReg Final Conference Peter Harrison gave a talk on EuroFAANG Data Infrastructure: standardizing and presenting BovReg and, global FAANG data. This included the Ensembl cattle annotation, and the need for improved gene and regulatory annotation to be improved in the coming years.
Year(s) Of Engagement Activity 2024
URL https://bovreg.eu/bovreg-final-conference/
 
Description BovReg Final Confernece - Future of Functional Annotation beyond BovReg panel discussion 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact During the final conference for the Horizon 2020 project BovReg, Peter Harrison contributed to panel discussions in the Future of Functional Annotation beyond BovReg. The discussion included the future annotation and regulatory builds for cattle, and how this award could support those efforts going forward to ensure this becomes available to the community through Ensembl.
Year(s) Of Engagement Activity 2024
URL https://bovreg.eu/bovreg-final-conference/
 
Description ISAG 2023 - FAANG workshop Panel discussion 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact At the International Society for Animal Genetics Conference in Cape Town Garth Ilsley (EMBL-EBI) and Emily Clark (Roslin) were involved an open discussion on the implementation of the next phase of FAANG. The discussion included efforts to make functional annotation more accessible including to other spaces such as industry and animal breeders, and whether it was possible to out source annotation efforts to the community. The discussion was very relevant to the development of priorities for the Ensembl Genome Browser for farmed and domestic animals, particularly that the community saw additional regulatory builds for more species as a priority.
Year(s) Of Engagement Activity 2023
URL https://www.isag.us/Docs/Proceedings/ISAG_2023_Abstracts.pdf
 
Description PAG 31 - Workshop on Workshop: Genome Annotation Resources at the EBI Adam Frankish (on behalf of Jane Loveland) gave a talk on Vertebrate Genomes in Ensembl 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact At the PAG 31 conference in the Workshop on Genome Annotation Resources at the EBI Adam Frankish gave a talk on Vertebrate Genomes in Ensembl. This was given on behalf of Jane Loveland who could not attend for personal reasons. The talk covered Ensembl (www.ensembl.org) infrastructure for accessing genomic information covering over 300 vertebrate species, including cattle, pig, sheep, horse and chicken. We generate automatic, evidence-based genome annotation from multiple lines of evidence. The audience asked questions and requested more information on how to access these resources.
Year(s) Of Engagement Activity 2024
URL https://pag.confex.com/pag/31/meetingapp.cgi/Paper/52768
 
Description PAG31 - Panel Discussion Implementing the Next Phase of FAANG 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact At the global FAANG workshop at the Plant and Animal Genomes conference (PAG31) Emily Clark (Roslin Institute) and Peter Harrison (EMBL-EBI) were involved an open discussion on the implementation of the next phase of FAANG. The discussion included the task forces and efforts to make functional annotation more accessible including to other spaces such as industry and animal breeders. The discussion was very relevant to the development of priorities for the Ensembl Genome Browser for farmed and domestic animals.
Year(s) Of Engagement Activity 2024
URL https://plan.core-apps.com/pag_2024/abstract/a6603333-8741-4c5d-a45f-f5d49ee1d01c