Continued development of the ChEBI database and ontology for improved interoperability with biomedical resources

Lead Research Organisation: European Bioinformatics Institute
Department Name: Chemoinformatics and Metabolism

Abstract

Today, the biological sciences are generating an enormous amount of data aimed at tackling fundamental questions such as 'What is the molecular basis for life?', 'How do organisms work?' and 'How does disease arise and how can it be treated?'. While research in the past has often followed a reductionist approach - studying the parts to understand the whole - today it increasingly follows a systems approach, integrating insights from the past into a holistic model of an organism (systems biology). The goal is to use such a model to perform a computer simulation of the organism and to use this to answer questions such as 'What effect will the addition of compound X (for example a drug) have on the organism?'. Isolated approaches to organizing the data in particular fields in the molecular sciences - information on genes (the code of life), proteins (an organism's chemical factories) and small molecules such as sugars or drugs - will hamper synergistic insights. Instead, databases in the biological sciences, which are used to parametrize such simulations, need to be interlinked and interoperable, allowing seamless movement amongst them. Because biological databases are generated on a worldwide basis by diverse communities, their integration creates obvious challenges. Besides simple technical questions of interoperability, the bioscience community is therefore working on common data models and standards. Scientists create rules on how to name and encode scientific information in a computer (semantics) and how a particular piece of such information relates to the scientific concepts in its surroundings (ontologies). This results in ontological chains such as 'A fox is_a mammal, which again is_a animal', from which the computer automatically reasons that a fox is an animal, even if this is not explicitly stated. Besides the so-called 'is_a' relationship between entities in ontologies, there exists a whole range of other relationships such as 'is_part_of', but which may be relevant only in certain fields of knowledge. Since ontologies can be complex and those of neighbouring fields may be interlinked, they allow machines to reason about the world. The database Chemical Entities of Biological Interest (ChEBI) provides for the bioscientific community semantic and ontological information as well as stable identifiers for small chemical compounds (as are most drugs). Areas such as drug discovery or systems biology bring together information about the morphology of cells, genes and proteins, as well as the small molecules that act on these. The interlinking between these bits of information in databases is typically performed through stable identifiers assigned to entities such as single genes, proteins or small molecules by standardization bodies and database providers. In addition, formal and so-called 'trivial' names are assigned and associated with both the entity and the stable identifier in the database. ChEBI acts as a resource for such names and stable identifiers in the area of small molecules of biological interest. For this purpose it is widely used in the bioscience community, who send formal requests for the assignment of identifiers for particular small molecule entities to the ChEBI team, who then perform the assignment, publish the information into the public domain and inform the requesting party that the request has been fulfilled. Also acting as an ontology, ChEBI puts small molecule structures and their structural properties into an ontological context. It makes statements such as 'D-Glucose is_a D-aldohexose, which is_a ... [various is_a relationships omitted] ... which is_a monosaccharide, which is_a sugar.' Again, ontological chains such as the one above allow computers to make statements about the world (of chemistry in this case), which have not been explicitly coded elsewhere. This is useful, for example, in the field of text mining, the computer-based re-discovery of knowledge in the printed literature.

Technical Summary

ChEBI is a freely available dictionary of molecular entities focused on 'small' chemical compounds. The primary motivation behind ChEBI was to provide a high-quality, thoroughly annotated controlled vocabulary to promote the correct and consistent use of unambiguous biochemical terminology throughout the molecular biology databases at the EBI and worldwide. However, this aim could not be achieved outside of a wider context of general chemistry and chemical nomenclature. The scope of ChEBI encompasses not only 'biochemical compounds' but also pharmaceuticals, agrochemicals, laboratory reagents, isotopes and subatomic particles. ChEBI is designed as a relational database, which is implemented in an Oracle database server. A number of utility applications, implemented mainly in Java and Unix scripts, provide additional functionality around the database, such as the loading of data from external sources. Specialized web-based interfaces provide for both public access to the data and restricted access to the annotation tool. ChEBI stores 2D or 3D structural diagrams as connection tables in MDL molfile format. One entity can have one or more connection tables. One-dimensional strings such as IUPAC InChI, IUPAC InChIKey and SMILES are automatically derived from the default connection tables. Every ChEBI entry contains a list of parent and children entries and the names of the relationships between them. ChEBI can be accessed via the web and Web Services.The entire ChEBI data is available for download in four different formats from the FTP server. The structures of molecular entities from ChEBI are made available to the PubChem database. Feedback to the ChEBI team is provided via a SourceForge Forum. A tool for user-driven direct deposition of chemical data to ChEBI is under development.

Publications

10 25 50
 
Description The classification system used for the entries in the ChEBI chemistry database has been extensively developed and modified to be compatible with those used by biologists and is now used extensively by the bioinformatics community to enable chemistry to be included within biology databases.
The display of the classification of any particular entity has been significantly improved, making the relationships between entities much easy to see.
Exploitation Route Realisation that chemistry can be linked in with biology classification means that more and more biology data sources are relying on ChEBI to handle the associated chemistry. ChEBI is an increasingly important resource for the bioinformatics and metabolomics communities.
Sectors Agriculture, Food and Drink,Chemicals,Education,Pharmaceuticals and Medical Biotechnology,Other

 
Description All developments have been incorporated in the ChEBI database and website and communicated to database users in a variety of ways, including training sessions at various locations in the UK
First Year Of Impact 2011
 
Title Additional RO relationships annotated in ChEBI, including disjoint_from and alignment to BFO 
Description Additional Relation Ontology (RO) relationships have been annotated in ChEBI, including the disjoint_from relationship. These have been made available as modular extensions to the ChEBI ontology, downloadable from: ftp://ftp.ebi.ac.uk/pub/databases/chebi/ontology/. The ChEBI alignment to the Basic Formal Ontology (BFO) has also been made available in the same folder. 
Type Of Material Database/Collection of data 
Year Produced 2011 
Provided To Others? Yes  
Impact The ChEBI Ontology is now widely used by other biological ontologies to handle terms relating to chemical entities. 
URL http://ftp://ftp.ebi.ac.uk/pub/databases/chebi/ontology/
 
Title Annotation of the ChEBI Ontology after the adoption of Basic Formal Ontology (BFO) and Relation Ontology (RO) ontologies: alkaloids 
Description There has been considerable discussion as to whether, like "natural product", the term "alkaloid" should be treated as a "role". However, to do so would result in problems arising from the subclassification of alkaloids into "indole alkaloid", "isoquinoline alkaloid", "pyridine alkaloid", etc., which would then combine the role of alkaloid with the structural descriptors "indole", "isoquinoline", "pyridine", etc., for which is_a links would be required. Such conflation of is_a and has_role class is not permitted within the BFO. Following consultation with users and ontologists, we decided to class "alkaloid" as a structural feature, and hence use the is_a relationship to link relevant entities to the the term "alkaloid". The manual annotation required has been completed. 
Type Of Material Database/Collection of data 
Year Produced 2011 
Provided To Others? Yes  
Impact Following the adoption of BFO and RO in 2011, the ChEBI Ontology is now used by numerous biological ontologies for handling all chemistry-related terms. Probably the largest ontology to make use of the ChEBI ontology in this way is the Gene Ontology (GO). 
URL http://www.ebi.ac.uk/chebi/
 
Title Annotation of the ChEBI Ontology after the adoption of Basic Formal Ontology and Relation Ontology ontologies: "natural product" 
Description The classification of the term "natural product" as a "role" causes problems. However, as there are no real constraints on what the structure of a natural product can be, and no way to tell based on looking at a structure that it is a natural product, the term does not belong in the chemical structure branch of the ontology. Furthermore, the term "natural product" can have different meanings to different user groups, ranging from "any compound produced by natural means" to much more restrictive definitions invoking processes occurring within cells. We found that most users regarded the term "natural product" to be a synonym for "metabolite". We have therefore reassigned compounds previously assigned within the ontology as natural products to "has_role metabolite". 
Type Of Material Database/Collection of data 
Year Produced 2012 
Provided To Others? Yes  
Impact Following the adoption of Basic Formal Ontology and Relation Ontology, the ChEBI ontology is now widely used by biological ontologies to handle all of their chemistry (small molecule) terms 
URL http://www.ebi.ac.uk/chebi/
 
Title Annotation of the ChEBI Ontology after the adoption of Basic Formal Ontology and Relation Ontology ontologies: is_a completeness 
Description A large number of ChEBI entities were originally classified in the ontology using one or more of the permitted relationships (has_part; is_conjugate_base_of; is_conjugate_acid_of; is_tautomer_of; is_enantiomer_of; has_functional_parent; has_parent_hydride; is_substituent_group_from; or has_role), but without any is_a relationship being present. It subsequently became apparent that, to be compatible with Basic Formal Ontology (BFO), an is_a relationship would be essential for every classified entry. This was achieved by the manual annotation of over a thousand ChEBI entiites. 
Type Of Material Database/Collection of data 
Year Produced 2012 
Provided To Others? Yes  
Impact The ChEBI ontology is now used by a number of biological ontologies, such as the Gene Ontology (GO), to handle all of their chemical-related terms. 
URL http://www.ebi.ac.uk/chebi/
 
Description ChEBI and Gene Ontology collaboration 
Organisation Gene Ontology Consortium
Country Global 
Sector Charity/Non Profit 
PI Contribution As an outcome of this grant, ChEBI has been adopted as the chemical ontology behind the scenes for chemical representation and reasoning in the Gene Ontology. Chemicals appearing in Gene Ontology entities are explicitly linked to ChEBI, and the ChEBI hierarchy is used to automatically arrange the Gene Ontology hierarchy for the corresponding processes and functions. This achievement and ongoing collaboration was enabled through the greater interoperability facilitated by the work done in the context of this grant. This collaboration was initiated February 2010; work on backlog terms was essentially completed in December 2012, although work to include new terms continues as and when required.
Collaborator Contribution Regular meetings, correspondence and discussions between the groups to decide on appropriate new ontology terms, align hierarchies, agree definitions of terms, etc.
Impact The ChEBI ontology is now used for all of the Gene Ontology terms involving a chemical entity (e.g. xyz metabolic process; xyz transport process, etc., where xyz is the name of a chemical entity. The collaboration involves chemists, biologists and ontologists.
Start Year 2010
 
Title Ontology visualisation software 
Description In the development of a new ontology visualisation for the ChEBI website, a novel ontology visualisation library was developed. Following initial deployment of the new visualisation on the ChEBI website in May 2012, minor changes to reflect user feedback were made in June 2012. The software was made available as open source in December 2012 and can be accessed at https://github.com/muthuvenkat/ChebiOntoViz/. The library includes a fully functioning ready-to-deploy standalone implementation of the ChEBI visualisation, but is easily modifiable to work with alternative bio-ontologies. 
Type Of Technology Software 
Year Produced 2012 
Impact No actual Impacts realised to date 
URL https://github.com/muthuvenkat/ChebiOntoViz/
 
Description Extension of our outreach and training capabilities to the UK biomedical community. 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact While members of the ChEBI team have provided training courses both internationally throughout Europe and at the European Bioinformatics Institute in the UK for several years, the grant enabled this to be both extended and updated through the creation of a new one-day hands-on training course, tailored to the UK biomedical community. The course includes an introduction to ChEBI, browsing and searching the database, an introduction to the ChEBI ontology, and browsing and searching in the ontology. Other sources of information on small molecules are also introduced and compared.
The course was initially run at the NIMR-MRC, London on April 27 2012, and subsequently at various other UK locations, as follows:
University College, London (May 15)
The University of Sheffield (July 6)
The Nutrition Society, London (September 13)


Feedback from attendees has been very positive. A regular repeat of this training was requested by the Nutrition Society (the next course will be in early 2015).
Year(s) Of Engagement Activity 2012,2013
 
Description Improved ontology visualisation in ChEBI website 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other academic audiences (collaborators, peers etc.)
Results and Impact The ChEBI ontology visualisation has been redeveloped to introduce a new user-friendly, interactive visualisation, which first appeared on the ChEBI website in May 2012. For example, the visualisation for the caffeine molecule as placed in the ChEBI ontology is available http://www.ebi.ac.uk/chebi/chebiOntology.do?chebiId=CHEBI:27732. Similar pages are available for every entry in ChEBI. The design was shown to users at a usergroup meeting and introduced with tweaks resulting from user feedback.

Subsequent feedback has been very positive
Year(s) Of Engagement Activity 2012
URL http://www.ebi.ac.uk/chebi/chebiOntology.do?chebiId=CHEBI:27732
 
Description MetaboLights and ChEBI at the Nutrition Society (London) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact We did a full day MetaboLights and ChEBI presentation (incl Rhea, Reactome and the Enzyme Portal) and hands-on workshop at the Nutrition Society (London). There were 12-15 participants
Year(s) Of Engagement Activity 2015
URL http://www.nutritionsociety.org/training-and-education/bioinformatics-nutritionists
 
Description Poster and presentation at the MetaboMeeting on Cambridge 2015 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact At the MetaboMeeting we presented posters for ChEBI and MetaboLights. We also had a presentation on metabolomics that covered both these resources.
We get quite a lot of interest as a result of these activities, and study submission frequency normally increase as a result
Year(s) Of Engagement Activity 2015
URL http://thempf.org/mpf_cms3/conferences/forthcoming-meetings/metabomeeting-2015
 
Description Presentations on 8th international biocuration conference in China 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact We presented MetaboLights and ChEBI at the 8th international biocuration conference in China. We presentation MetaboLights in Mandarin, and we saw more submissions coming in from China as a result of this.
Year(s) Of Engagement Activity 2015
URL http://biocuration2015.big.ac.cn