BBSRC-NSF/BIO: Next generation collaborative annotation of genomes and synteny

Lead Research Organisation: European Bioinformatics Institute
Department Name: Genome Assembly and Annotation

Abstract

Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.

Technical Summary

Development of the genome annotation tools Artemis and Apollo has run in parallel for almost 20 years. The Berkeley-based Apollo team and the Sanger-based Artemis team have in some cases produced clear alternative paradigms for viewing and annotating genome data but a more predominant theme has been convergence in purpose and approach. A particular strength of the Apollo system is its performance, scalability, and interoperability. We will build upon Apollo infrastructure to include components that have been essential to Artemis users and have been frequently requested by the Apollo community. These will include developing "snap-to-grid" functionality that auto-aligns exons to reading frames during interactive editing; support for small-scale users to load draft annotated genomes from a file, make changes and store them in the same file (a highly used feature of Artemis); and parallel viewing of the same sequence at two zoom levels. The Artemis Comparison Tool is built upon Artemis software and enable comparative genomics data and synteny to be explored in the context of genome annotation. We will include ACT-influenced views into the new Apollo. Moreover, adaptors will be created that allow Apollo to present genome annotation and comparative views directly from Ensembl databases and APIs. This will enable multiple users to remotely perform fast synteny-guided annotation of multiple genomes, in a way that was only previously possible for small genomes using local edited flat-files.

The new Apollo will replace the existing Artemis and Apollo projects and lead to collaborative development and long term sustainability. This tool will support scalable short-term annotation projects as well as long-term curation, enabling non-experts to annotate and curate across the full range of sequenced genomes.

Planned Impact

The primary beneficiaries of this work will be genomic scientists in academic and industrial research and in education. In a research context, the tool will be used by professional annotators ("biocurators") during genome projects, to edit automated annotation of gene models, and to update and improve those annotations based on new evidence. Due to the increasing number of species that require curatorial attention, it is vitally important that expertise is shared as broadly as possible; combining the two major annotation tools into a single new generation tool will allow convergence in the way genomes are annotated and curated, helping to solidify and establish best practices for professional biocuration. The software will also encourage participation in genome annotation by researchers who are interested in a particular species (and may be domain experts in that species) but are not professional biocurators; for example, members of the research communities that work with that species and so are downstream users of the gene models being produced. Because the tool is based on a popular browser with a well-established "look and feel", most users should find creating or editing annotation intuitive and satisfying.

The current Artemis and Apollo tools are used extensively for teaching purposes, as well as research. Several projects in the US have incorporated Apollo into undergraduate teaching and Artemis has been an integral part of bioinformatics training workshops for several thousand junior researchers around the world. Artemis has also been used in an engagement project involving more than 70 UK schools. For many students at all levels, experiencing genome annotation first hand is eye opening way to understand common concepts in genomics, genetics and molecular biology. In the proposed work a new generation annotation tool will be produced, merging existing annotation paradigms. This will enable a greater convergence in teaching approaches - it will no longer be necessary to train two independent communities. This will simplify the field from a student perspective and bring two communities of genome scientists together.

Publications

10 25 50

publication icon
Harrison PW (2023) Ensembl 2024. in Nucleic acids research

publication icon
Martin FJ (2023) Ensembl 2023. in Nucleic acids research

 
Description The development work for the integration of Artemis curation tools into an easy to use Apollo interface is well underway. The first phase has seen newly developed functionality to enable data extraction from common file formats (GFF3 and FASTA), allowing the Apollo Collaboration Server to display numerous views including Synteny, Linear, Dotplot and the 6 frame view from Artemis. The combination of these views into a single friendly Apollo interface combines the annotation and curation strengths of the tools, and thus these views on the data allow for different interpretations and insights to be made while allowing features within them to be edited. The project has been developed to be easily deployable using the industry recognised containerisation technology Docker. This has enabled test deployments to be made on EMBL-EBI Cloud infrastructure and will ensure future deployments are straightforward. This is an important early goal for the integration of Apollo into the annotation of the key resources at EMBL including Ensembl and WormBase.

The development team has added the ability for people to login using common internet accounts (such as Google or Microsoft). Once logged in, people are able to see where other people who are also logged in at the same time are in the genomic space, providing a similar experience reading a Google Doc with multiple others. To ensure relationships between genomic entities in Apollo are "valid", the ability to define these hierarchies can now be controlled by ontologies. For a user chosen ontology, only options which fit within it are provided when a person tries to create a child feature of another feature. In addition to this, a "plugin infrastructure" has been put in place to allow Organisations to create custom validations to adhere to any specific local data rules.

The beta release of Apollo is planned for May 2023. Following this a user summit is planned for June to engage with multiple user groups to establish the requirements for future roadmaps in addition to the features outlined in the grant. This further supports the already well established User Experience design process to combine the best aspects of Apollo and Artemis into the new Apollo service. To set the scene for the latter, there have been several meetings with teams at EMBL-EBI, establishing the current state of the art and feeding key user experiences into the design sprints.
Exploitation Route The WebApollo interface, complete with artemis curation tools, will support a huge variety and array of community annotation and curation projects. We expect that the refined and user led design we will see significant utilisation of the software from both academia and industry. It will also be the main curation software for Ensembl Havana and WormBase manual annotation efforts. The tool will have usage in a wide range of species and industries including healthcare settings in human and mouse and agricultural species for annotations relating to food security and biodiversity.
Sectors Agriculture, Food and Drink,Healthcare,Pharmaceuticals and Medical Biotechnology

 
Title Apollo 
Description Apollo is a web-based tool that enables collaborative on-line editing of genome annotations. The BBSRC/NSF award supports further development of the tool. 
Type Of Technology Webtool/Application 
Year Produced 2013 
Open Source License? Yes  
Impact None far from the development supported by the award 
URL https://github.com/GMOD/Apollo
 
Title Apollo3 (WebApollo) 
Description Apollo3 is the latest iteration of the web-based tool that enables collaborative on-line editing of genome annotations. The BBSRC/NSF award supports further development of the tool. 
Type Of Technology Webtool/Application 
Year Produced 2020 
Open Source License? Yes  
Impact None, not yet beta released 
URL https://github.com/GMOD/Apollo3