BBSRC-NSF/BIO Next generation collaborative annotation of genomes and synteny

Lead Research Organisation: University of Glasgow
Department Name: College of Medical, Veterinary, Life Sci

Abstract

The number of species with sequenced genomes is rising rapidly, and will continue to do so with projects to sequence all eukaryotic species in the UK (Darwin Tree of Life project) and on the planet (Earth Biogenome Project) underway. To make sense of assembled genome data important features, such as protein-and non-coding genes, need to be identified and described; this general process is called annotation. Despite major advances in methods to automatically annotate genomes, the most accurate annotations require human assessment. However, the prohibitive cost usually prevents manual annotation (with curated updates) from being performed on individual species. A scalable alternative is to direct manual effort towards reference datasets and to harvest contributions from the broader research community. The resulting high quality annotations can then be projected across species based on inferred homology. It is essential that the software used for annotation is fast, flexible and easy to use by different communities of annotators (professional curators, bench biologists, or curious non-experts).

Of the currently available software platforms to annotate genomes, Artemis and Apollo are the two most popular and have been in wide use for 20 years. Artemis, developed at the Sanger Institute, has been used primarily for viewing, annotating and analysing the genomes of prokaryotic and eukaryotic microbes. A major strength of Artemis is its companion the 'Artemis Comparison Tool' (ACT) that allows gene structures to be created or edited in the context of discovering and exploring genome conservation. A major limitation of both Artemis and ACT is that the software performs badly on sequences larger than a few tens of megabases. Like Artemis, Apollo started as a desktop tool, but was redesigned as a web-based tool and now runs on a shared server so that multiple users can browse and create annotations across the same genome simultaneously. Apollo comfortably handles any size genome and scales well with multiple concurrent users.

Development of Artemis and Apollo software has run in parallel for almost 20 years. The Berkeley-based Apollo team and the Sanger-based Artemis team have, in some cases, found alternative ways to view and annotate genome data; but more often, have found convergence in purpose and approach. The proposed application will integrate the best of Artemis and Apollo to create a single higher performance annotation platform. The new Apollo will benefit from modern and modular architecture, for collaborative development and improved sustainability. Apollo will also be enhanced with new data interfaces, developed in collaboration with the EMBL-EBI group, so that genome comparison data can be accessed across servers, and annotation performed in the context of exploring synteny.

The new generation of annotation tool will replace the existing Artemis and Apollo projects and be integrated into major genome annotation projects as well as retaining is usability by individual small-scale users.

Technical Summary

Development of the genome annotation tools Artemis and Apollo has run in parallel for almost 20 years. The Berkeley-based Apollo team and the Sanger-based Artemis team have in some cases produced clear alternative paradigms for viewing and annotating genome data but a more predominant theme has been convergence in purpose and approach. A particular strength of the Apollo system is its performance, scalability, and interoperability. We will build upon Apollo infrastructure to include components that have been essential to Artemis users and have been frequently requested by the Apollo community. These will include developing "snap-to-grid" functionality that auto-aligns exons to reading frames during interactive editing; support for small-scale users to load draft annotated genomes from a file, make changes and store them in the same file (a highly used feature of Artemis); and parallel viewing of the same sequence at two zoom levels. The Artemis Comparison Tool is built upon Artemis software and enable comparative genomics data and synteny to be explored in the context of genome annotation. We will include ACT-influenced views into the new Apollo. Moreover, adaptors will be created that allow Apollo to present genome annotation and comparative views directly from Ensembl databases and APIs. This will enable multiple users to remotely perform fast synteny-guided annotation of multiple genomes, in a way that was only previously possible for small genomes using local edited flat-files.

The new Apollo will replace the existing Artemis and Apollo projects and lead to collaborative development and long term sustainability. This tool will support scalable short-term annotation projects as well as long-term curation, enabling non-experts to annotate and curate across the full range of sequenced genomes.

Planned Impact

The primary beneficiaries of this work will be genomic scientists in academic and industrial research and in education. In a research context, the tool will be used by professional annotators ("biocurators") during genome projects, to edit automated annotation of gene models, and to update and improve those annotations based on new evidence. Due to the increasing number of species that require curatorial attention, it is vitally important that expertise is shared as broadly as possible; combining the two major annotation tools into a single new generation tool will allow convergence in the way genomes are annotated and curated, helping to solidify and establish best practices for professional biocuration. The software will also encourage participation in genome annotation by researchers who are interested in a particular species (and may be domain experts in that species) but are not professional biocurators; for example, members of the research communities that work with that species and so are downstream users of the gene models being produced. Because the tool is based on a popular browser with a well-established "look and feel", most users should find creating or editing annotation intuitive and satisfying.

The current Artemis and Apollo tools are used extensively for teaching purposes, as well as research. Several projects in the US have incorporated Apollo into undergraduate teaching and Artemis has been an integral part of bioinformatics training workshops for several thousand junior researchers around the world. Artemis has also been used in an engagement project involving more than 70 UK schools. For many students at all levels, experiencing genome annotation first hand is eye opening way to understand common concepts in genomics, genetics and molecular biology. In the proposed work a new generation annotation tool will be produced, merging existing annotation paradigms. This will enable a greater convergence in teaching approaches - it will no longer be necessary to train two independent communities. This will simplify the field from a student perspective and bring two communities of genome scientists together.

Publications

10 25 50
 
Description Collaboration with VEuPathDB 
Organisation University of Pennsylvania
Country United States 
Sector Academic/University 
PI Contribution VEuPathDB is an NIAID funded project that provides access to genomics and functional genomics data for a broad range of eukaryotic pathogens. Genome annotation by curators and the community is performed using the existing tool Apollo. The BBSRC/NSF project will create a replacement tool.
Collaborator Contribution As part of the development, VEuPathDB staff participate in monthly user-feedback meetings and will be performing beta testing.
Impact Use of Apollo is enabling ongoing updates to several hundred genome projects
Start Year 2022