Further development of the QuickGO web interface for browsing and retrieving Gene Ontology Annotation data

Lead Research Organisation: European Bioinformatics Institute
Department Name: Sequence Database Group

Abstract

The Gene Ontology (GO) Consortium has developed an ontology for the description of genes and gene products in a standardised format. GO consists of three structured vocabularies to describe molecular function, biological process and cellular component. GO has become the gold standard for annotating gene products as it facilitates the efficient retrieval and comparison of gene products from the same or multiple species. Currently there are ~20,000 GO terms used to describe gene products in many model organism and genome annotation databases. In support of standardized nomenclature, the UniProt group joined the GO annotation effort and initiated the Gene Ontology Annotation (GOA) project to provide assignments of GO terms to proteins, particularly from the human proteome. In addition to manual annotation, GOA also provides automated in silico annotated entries for over 100,000 species, and supplements the GOA-UniProtKB data with annotations from the GO Consortium members and associates. GOA is the largest and most comprehensive open source contributor of annotations to the GO Consortium annotation effort. GO annotation can be downloaded via EBI or GO ftp sites or queried from various GO browsers. However, despite the fact that there are many GO tools and browsers available, none of these perform all the tasks frequently requested by GO Users (GO Consortium Survey, Oct 2005). Biologists with little dry lab experience prefer to query simple web-based interfaces, but may want to retrieve GO annotations for lists of genes and link to other databases, while Bioinformaticians prefer to download bulk data in specific formats. However, batch retrieval is not possible through existing interfaces, and problems exist with mapping between identifiers from different databases. QuickGO was one of the first web-based GO browsers and is well known and used extensively. QuickGO was initially designed as an annotation aid for UniProtKB curators, and developed further when others found it useful too. Currently, it simply enables users to search core GO data and annotations to single UniProtKB accessions, InterPro IDs or Enzyme Commission (EC) numbers. With the number of user requests increasing and new functionalities required, it needs to be developed further to keep up with the ever-increasing demands of the user community. In addition, the GOA-UniProtKB gene association file is constantly increasing and becoming unwieldy, especially for users only interested in a subset of annotations. Retrieving specific data from this file is becoming tedious, and new methods for single or bulk retrieval of GO and GOA data are essential. User requirements collected via e-mails and surveys indicate a need for a simple web-based tool to perform batch queries for GO annotation with any identifier, and to view and download ALL or sets of annotations. Our objectives are therefore to extend the QuickGO browser to enable the following requests: - Single or batch search /extraction of data in either GOA association file, FASTA or UniProtKB format for single or batches of genes or proteins searched by UniProtKB ID or accession number - GO term search /extraction of all genes or proteins annotated to a GO term - Single or batch searches for alternative IDs from e.g. UniGene, DDBJ/EMBL/GenBank, Entrez, Ensembl, International Protein Index (IPI), RefSeq, etc. The GO ontologies and GOA data have proven their popularity through the number citations and the scale of web requests and data downloads from ftp sites. The use of GO supports the communication of biological information in a standardised way, and we should be providing the tools necessary to support and encourage these activities. By extending the already popular QuickGO tool to facilitate better data retrieval and manipulation, we will benefit the enormous scientific community who are followers of GO, and provide efficient accessibility to the data for bench scientists and Bioinformaticians.

Technical Summary

This proposal was developed in response to user requests for an extension of the existing capabilities of GO tools. The QuickGO browser is a popular choice of tool for searching and retrieving GO terms and annotations to specific genes or proteins. Currently the tool facilitates searching for: - GO term, which retrieves the relevant term, its synonym, description and child terms, displaying the term within the GO hierarchy - UniProtKB accession number/ID, which retrieves the GO annotation for that protein - InterPro ID or Enzyme Commission (EC) number, which retrieves the GO terms mapped to those IDs/numbers. The objective of this proposal is to extend the functionality of QuickGO to facilitate single or batch retrieval of GO annotations for UniProtKB or other accession numbers, providing the output in a user-defined format. - Batch searching: A QuickGO web service will be developed to provide computational access to GOA data and facilitate batch retrieval of annotations. Given a of list of accession numbers or IDs, the web service will extract the corresponding GO annotations and return the results in GOA association file, FASTA or UniProtKB format. - Annotations to a given term: The current search engine will be extended to allow the user to retrieve all annotations to a given GO term. For some GO terms there could be in excess of 1000 genes or proteins annotated, so the results display page will need to be carefully developed to account for large datasets. - Searching with alternative IDs: In a separate project, UniProtKB accession numbers are being mapped to their corresponding identifiers from other databases, such as IPI, Ensembl, DDBJ/EMBL/GenBank, etc. QuickGO will use these mapping files to enable researchers searching with e.g. Ensembl accession numbers, to retrieve the GO annotations that are linked to the corresponding UniProtKB identifiers. This will be applied to both the single and batch query options.

Publications

10 25 50

publication icon
Binns D (2009) QuickGO: a web-based tool for Gene Ontology searching. in Bioinformatics (Oxford, England)

publication icon
Huntley RP (2009) QuickGO: a user tutorial for the web-based Gene Ontology browser. in Database : the journal of biological databases and curation

 
Description QuickGO, a new fast web-based browser of the Gene Ontology and Gene Ontology annotation data was developed.
Exploitation Route The QuickGO browser is used by hundreds of thousands of scientists every year as part of their research activities.
Sectors Agriculture, Food and Drink,Education,Energy,Environment,Healthcare,Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology

URL http://www.ebi.ac.uk/QuickGO/
 
Description The QuickGO, a new fast web-based browser of the Gene Ontology and Gene Ontology annotation data is used by hundreds of thousands of scientists every year as part of their research activities.
Sector Agriculture, Food and Drink,Education,Environment,Healthcare,Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology
Impact Types Economic