📣 Help Shape the Future of UKRI's Gateway to Research (GtR)

We're improving UKRI's Gateway to Research and are seeking your input! If you would be interested in being interviewed about the improvements we're making and to have your say about how we can make GtR more user-friendly, impactful, and effective for the Research and Innovation community, please email gateway@ukri.org.

BBSRC-NSF/BIO Next generation collaborative annotation of genomes and synteny

Lead Research Organisation: University of Glasgow
Department Name: College of Medical, Veterinary, Life Sci

Abstract

The number of species with sequenced genomes is rising rapidly, and will continue to do so with projects to sequence all eukaryotic species in the UK (Darwin Tree of Life project) and on the planet (Earth Biogenome Project) underway. To make sense of assembled genome data important features, such as protein-and non-coding genes, need to be identified and described; this general process is called annotation. Despite major advances in methods to automatically annotate genomes, the most accurate annotations require human assessment. However, the prohibitive cost usually prevents manual annotation (with curated updates) from being performed on individual species. A scalable alternative is to direct manual effort towards reference datasets and to harvest contributions from the broader research community. The resulting high quality annotations can then be projected across species based on inferred homology. It is essential that the software used for annotation is fast, flexible and easy to use by different communities of annotators (professional curators, bench biologists, or curious non-experts).

Of the currently available software platforms to annotate genomes, Artemis and Apollo are the two most popular and have been in wide use for 20 years. Artemis, developed at the Sanger Institute, has been used primarily for viewing, annotating and analysing the genomes of prokaryotic and eukaryotic microbes. A major strength of Artemis is its companion the 'Artemis Comparison Tool' (ACT) that allows gene structures to be created or edited in the context of discovering and exploring genome conservation. A major limitation of both Artemis and ACT is that the software performs badly on sequences larger than a few tens of megabases. Like Artemis, Apollo started as a desktop tool, but was redesigned as a web-based tool and now runs on a shared server so that multiple users can browse and create annotations across the same genome simultaneously. Apollo comfortably handles any size genome and scales well with multiple concurrent users.

Development of Artemis and Apollo software has run in parallel for almost 20 years. The Berkeley-based Apollo team and the Sanger-based Artemis team have, in some cases, found alternative ways to view and annotate genome data; but more often, have found convergence in purpose and approach. The proposed application will integrate the best of Artemis and Apollo to create a single higher performance annotation platform. The new Apollo will benefit from modern and modular architecture, for collaborative development and improved sustainability. Apollo will also be enhanced with new data interfaces, developed in collaboration with the EMBL-EBI group, so that genome comparison data can be accessed across servers, and annotation performed in the context of exploring synteny.

The new generation of annotation tool will replace the existing Artemis and Apollo projects and be integrated into major genome annotation projects as well as retaining is usability by individual small-scale users.

Technical Summary

Development of the genome annotation tools Artemis and Apollo has run in parallel for almost 20 years. The Berkeley-based Apollo team and the Sanger-based Artemis team have in some cases produced clear alternative paradigms for viewing and annotating genome data but a more predominant theme has been convergence in purpose and approach. A particular strength of the Apollo system is its performance, scalability, and interoperability. We will build upon Apollo infrastructure to include components that have been essential to Artemis users and have been frequently requested by the Apollo community. These will include developing "snap-to-grid" functionality that auto-aligns exons to reading frames during interactive editing; support for small-scale users to load draft annotated genomes from a file, make changes and store them in the same file (a highly used feature of Artemis); and parallel viewing of the same sequence at two zoom levels. The Artemis Comparison Tool is built upon Artemis software and enable comparative genomics data and synteny to be explored in the context of genome annotation. We will include ACT-influenced views into the new Apollo. Moreover, adaptors will be created that allow Apollo to present genome annotation and comparative views directly from Ensembl databases and APIs. This will enable multiple users to remotely perform fast synteny-guided annotation of multiple genomes, in a way that was only previously possible for small genomes using local edited flat-files.

The new Apollo will replace the existing Artemis and Apollo projects and lead to collaborative development and long term sustainability. This tool will support scalable short-term annotation projects as well as long-term curation, enabling non-experts to annotate and curate across the full range of sequenced genomes.

Planned Impact

The primary beneficiaries of this work will be genomic scientists in academic and industrial research and in education. In a research context, the tool will be used by professional annotators ("biocurators") during genome projects, to edit automated annotation of gene models, and to update and improve those annotations based on new evidence. Due to the increasing number of species that require curatorial attention, it is vitally important that expertise is shared as broadly as possible; combining the two major annotation tools into a single new generation tool will allow convergence in the way genomes are annotated and curated, helping to solidify and establish best practices for professional biocuration. The software will also encourage participation in genome annotation by researchers who are interested in a particular species (and may be domain experts in that species) but are not professional biocurators; for example, members of the research communities that work with that species and so are downstream users of the gene models being produced. Because the tool is based on a popular browser with a well-established "look and feel", most users should find creating or editing annotation intuitive and satisfying.

The current Artemis and Apollo tools are used extensively for teaching purposes, as well as research. Several projects in the US have incorporated Apollo into undergraduate teaching and Artemis has been an integral part of bioinformatics training workshops for several thousand junior researchers around the world. Artemis has also been used in an engagement project involving more than 70 UK schools. For many students at all levels, experiencing genome annotation first hand is eye opening way to understand common concepts in genomics, genetics and molecular biology. In the proposed work a new generation annotation tool will be produced, merging existing annotation paradigms. This will enable a greater convergence in teaching approaches - it will no longer be necessary to train two independent communities. This will simplify the field from a student perspective and bring two communities of genome scientists together.

Publications

10 25 50
 
Description A beta version of the software has been released an is now being tested
Exploitation Route The software is to be used by anyone involved in genome analysis. It will also be used for teaching genomes to schools, undergraduates and Masters students.
Sectors Education

Other

URL https://apollo.jbrowse.org/
 
Description Collaboration with VEuPathDB 
Organisation University of Pennsylvania
Country United States 
Sector Academic/University 
PI Contribution VEuPathDB is an NIAID funded project that provides access to genomics and functional genomics data for a broad range of eukaryotic pathogens. Genome annotation by curators and the community is performed using the existing tool Apollo. The BBSRC/NSF project will create a replacement tool.
Collaborator Contribution As part of the development, VEuPathDB staff participate in monthly user-feedback meetings and will be performing beta testing.
Impact Use of Apollo is enabling ongoing updates to several hundred genome projects
Start Year 2022
 
Description Ensembl Havana - Eukaryotic Annotation team 
Organisation EMBL European Bioinformatics Institute (EMBL - EBI)
Country United Kingdom 
Sector Academic/University 
PI Contribution In the new tool Apollo 3 functionality has been included to allow vertebrate genome annotators to navigate the complexities of alternate spliceforms
Collaborator Contribution Representatives from the group attend a monthly 'stakeholder' meeting to contribute ideas, specify details and contribute to overall prioritisation. They also attend a user 'summit' hosted at EMBL EBI in July 2023.
Impact No outputs yet
Start Year 2023
 
Description WormBase consortium 
Organisation WormBase (Biology and Genome of C.Elegans)
Country United States 
Sector Charity/Non Profit 
PI Contribution WormBase Consortium is led by Paul Sternberg of CalTech, Kevin Howe of the EBI, Matt Berriman of the Wellcome Sanger Institute, and Lincoln Stein of the Ontario Institute for Cancer Research. The consortium runs a model organism database containing data from research on C. elegans and other nematodes. WormBase Parasite provides searching and data access capabilities that are not available through the WormBase website
Collaborator Contribution WormBase curates reference genomes which are then imported into WormBase Parasite and provide important functional information for understanding the genomes of comparator species.
Impact Provision of annotated genomes for C. elegans and Brugia malayi
Start Year 2014
 
Description i5k 
Organisation U.S. Department of Agriculture USDA
Department Beltsville Agricultural Research Center
Country United States 
Sector Academic/University 
PI Contribution A technical specialist from the Knowledge Services Division has joined our monthly stakeholder meeting. They have highlighted numerous high priority IT security issues that we have need to comply with in order for Apollo 3 to be used at US Government site.
Collaborator Contribution We have included new layers of access control to enable beta testing of the platform on USAD (US Government) sites.
Impact no outputs yet
Start Year 2023
 
Title Apollo 3, beta version 
Description Apollo is a tool for collaborative, customizable, and scalable graphical genome annotation. 
Type Of Technology Software 
Year Produced 2024 
Open Source License? Yes  
Impact None - currently in beta testing 
URL https://apollo.jbrowse.org/blog/2024/12/17/beta-release
 
Description Virtual training workshop at BGA24 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Virtual workshop introducing the new Apollo 3 software at BioDiversity Genomics Academy 2024
Year(s) Of Engagement Activity 2024
URL https://thebgacademy.org/BGA24/sessions/apollo-24