Continued development of ChEBI towards better usability for the systems biology and metabolic modelling community

Lead Research Organisation: University of Manchester
Department Name: Computer Science


After a century of studying nature in greater and greater detail, generating the "parts list" of the molecular components within the cell, the biological sciences have undergone a paradigm shift in the last decade, moving towards putting together these individual molecular pieces to understand their interactions in a holistic context. It is these interactions which give rise to overall cellular processes, and their study has has been termed systems biology.

Systems biology brings together a wide range of information about cells, genes and proteins, as well as the small molecules that act on and within these biological structures. In the service of its application areas, such as drug discovery and industrial biotechnology, it gives a holistic perspective aiming to track and eventually simulate the entire functioning of biological systems. In order to build up such holistic models from such a vast collection of diverse data, integration of individual units of information from many diverse databases needs to be performed. This integration of such a high volume of data can only feasibly be performed computationally. To facilitate smooth integration, individual molecular components within the cellular system require stable and unique identifiers. These identifiers are assigned to entities such as genes, proteins or small molecules by standardization bodies and database providers, and effectively allow the molecular parts list to be catalogued. In addition to this, human-relevant information such as names and chemical and biological structures, relationships and properties are also associated with the various entities in the databases, providing resources that are useable by both software tools and researchers themselves.

The database Chemical Entities of Biological Interest (ChEBI) acts as a resource for such information and stable identifiers in the area of small molecules of biological interest. ChEBI provides for the bioscientific community semantic, biological and chemical information as well as stable identifiers for small chemical compounds relevant in biology, including the so-called metabolites. Metabolites are small molecules in organisms that are implicated in diverse processes including supplying the body with energy, serving as building blocks for tissue, and acting as a defence or as a signal within the organism or between organisms. For these purposes, ChEBI is widely used in the bioscience community, which sends formal requests for the assignment of identifiers for particular small molecule entities to the ChEBI team, who then perform the assignment, publish the information into the public domain and inform the requesting party that the request has been fulfilled.
The aim of the current proposal is to further develop the ChEBI resource and create surrounding tools towards comprehensively addressing the chemical informatics (software and data) needs of the systems biology and metabolic modelling communities, so that they in turn can further their objective to create meaningful simulations and models that enable whole-systems research into pressing public health and energy challenges. In order to facilitate this use, we propose to:

1. Develop a comprehensive software library for accessing ChEBI programmatically which will work across all major available operating systems;
2. Extend the ChEBI database resource to enhance stability, increase community involvement, add additional biologically relevant relationships, and provide a new powerful visualisation for the biological context of molecular entities;
3. Curate into ChEBI all known metabolites across important organisms in systems biology studies: human, mouse, E. coli and yeast.
4. Create new training materials and delivery of training courses to the community.

Technical Summary

The development of genome-scale metabolic reconstructions -- all-encompassing interlinked maps of all known metabolic reaction pathways for a given organism -- requires integrated computable knowledge about the biochemical entities involved, such as macromolecules and metabolites. The use of unambiguous, semantically typed, publicly available and perennial identifiers for model components is becoming increasingly recognised as being essential if systems biology models are to be shared, reused and developed by communities, maximising their benefit. Such annotation allows a wealth of chem- and bioinformatics data, such as chemical structures and protein sequences, to be immediately extracted from these resources via web service interfaces. As the BioModels Database currently contains thousands of models that use ChEBI identifiers, improvements to and extensions of the ChEBI resource will have an immediate benefit to the systems biology modelling community.

We propose here to undertake three such key improvements. Firstly, we will develop a comprehensive cross-platform API library for accessing ChEBI programmatically. The API, libChEBI, will be made publicly available as an open source library in Java and Python. It will include facilities such as extracting biologically relevant groups of compounds (such as tautomers), calculating additional physicochemical properties and semantic reasoning over model annotations. Secondly, we will extend the ChEBI database to enhance stability, increase community involvement and provide a new powerful visualisation for the biological context of molecular entities; Thirdly, we will curate into ChEBI all known metabolites across human, mouse, E. coli and S. cerevisiae.

Planned Impact

The programme of work described in this proposal will continue developing and maintaining ChEBI, an extensive chemical data resource already widely used by the chemistry and biology communities. The work focuses on the extension of the scope of the resource, and on a significant improvement in the computational usability of the database through the proposed libChEBI API. While a number of proposed improvements are focused towards the systems biology modelling communities, generic improvements to the resource's coverage of the chemical space, expanded definition of the relationships between chemicals, and the introduction of a robust programming library to access this rich ontology will produce a wide range of benefits for researchers in any field with an interest in (bio)chemistry and the development of software that underpins research in the area. The models and simulations of biological systems being developed by the systems biology and modelling community that will directly benefit from the proposed work filter into all life-science related industry, such as biotechnology, pharmaceutical, consumer goods, nutrition and health technology. All of these industries are increasingly relying on systems approaches and computational modelling in their research strategy, as documented in our supporting letters from Syngenta, Novartis and Unilever. In agriculture, the issue of efficient food production is also likely to benefit from modelling and simulation, for example by using models to understand how crop yield can be maximised. Designing crop yield for food security relies on metabolic analysis and therefore this resource has a direct impact on that activity.

ChEBI is becoming a more and more complete "textbook" for metabolism for the wider public. It has recently extended the usefulness of its pages by incorporating Wikipedia links together with descriptive textual extracts into the main entity pages. We provide an Entity of the Month with our monthly release as a method for engaging the wider public with the broader reach of high-quality annotated data. Annotated data and modelling are also becoming crucial for a personalised approach to healthcare, where models will be developed and calibrated for each person as a basis for rationalising therapies, nutrition and exercise regimes, etc. Improving the accuracy of modelling and simulation approaches as will be enabled by the software library and semantic reasoning we propose here will be a major factor in reducing the number of animal experiments carried out, which will be a generic benefit to a humane society. Therefore beneficiaries will be:
a) individual companies in the life-science area and their employees and
b) non-profit and commercial agricultural industry and their employees who use modelling as a research tool in their business;
c) health-related industries and practicing clinics applying personalised medicine; and finally
d) the public at large who will benefit from the products of all those organizations mentioned.

Thus this software resource will have significant consequences to society and human well-being. To facilitate the delivery of these benefits to industries and to the public at large, we will publicise our research widely in open access publications, create online impact using Twitter and by writing about our developments in grants, and deliver training courses to the UK community as an integral part of the achievement of our objectives.


10 25 50

publication icon
O Hagan S (2015) A 'rule of 0.5' for the metabolite-likeness of approved pharmaceutical drugs. in Metabolomics : Official journal of the Metabolomic Society

Description This research grant supports the bioinformatics resource ChEBI and its development towards aiding systems biology. The resource has been heavily used by the research community in developing maps of human metabolism and also by those developing whole-cell models. The resource is thus increasingly becoming essential for systems biology efforts.
Exploitation Route The ChEBI resource is of great importance in referring to molecules and defining exactly what they are. ChEBI is being used heavily in reconstructions of metabolism. However it is also being used more widely; an example is that all molecule entries in Wikipedia have a reference back to that molecule page in ChEBI, so it has a wider impact on all society.
Sectors Agriculture

Food and Drink


Digital/Communication/Information Technologies (including Software)






including Industrial Biotechology

Pharmaceuticals and Medical Biotechnology

Description ChEBI is a resource widely used by several industries, such as healthcare, pharma, consumer product industry, etc. ChEBI is also used by Wikipedia to provide an authoritative reference to chemicals of biological interest.
Sector Agriculture, Food and Drink,Chemicals,Digital/Communication/Information Technologies (including Software),Education,Energy,Environment,Healthcare,Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology,Other
Impact Types Cultural


Title Additional file 1: of Impact of kinetic isotope effects in isotopic studies of metabolic systems 
Description R scripts used to construct the models, perform the simulations and generate the figures. (TAR 137 kb) 
Type Of Material Database/Collection of data 
Year Produced 2015 
Provided To Others? Yes  
Title Additional file 1: of Impact of kinetic isotope effects in isotopic studies of metabolic systems 
Description R scripts used to construct the models, perform the simulations and generate the figures. (TAR 137 kb) 
Type Of Material Database/Collection of data 
Year Produced 2015 
Provided To Others? Yes  
Title Additional file 2: of Modeling the dynamics of mouse iron body distribution: hepcidin is necessary but not sufficient 
Description ZIP archive with model files in SBML and COPASI formats, and data files in TSV format. (ZIP 103Â kb) 
Type Of Material Database/Collection of data 
Year Produced 2017 
Provided To Others? Yes  
Title Additional file 2: of Modeling the dynamics of mouse iron body distribution: hepcidin is necessary but not sufficient 
Description ZIP archive with model files in SBML and COPASI formats, and data files in TSV format. (ZIP 103Â kb) 
Type Of Material Database/Collection of data 
Year Produced 2017 
Provided To Others? Yes  
Title ChEBI 
Description ChEBI is a database of chemical entities of biological interest and an ontology. 
Type Of Material Database/Collection of data 
Year Produced 2006 
Provided To Others? Yes  
Impact this is one of the most widely used source for data on biochemical molecules. It has become the main source of annotation of such molecules in systems biology models. The improvements carried out this year include 1) Improved pre-loading of data with partial automation of structure-based classification and 2) Improved release cycle: it is no longer on a monthly release but in an "near live" mode, with both new entries and updates to existing entries appearing on the website within a second of them having been curated. The web service (for programmatic access to the data) has also been updated to near live updates, while for those users who prefer stable monthly releases the downloadable files are still created monthly. 
Title MOESM1 of Analysis of drugâ endogenous human metabolite similarities in terms of their maximum common substructures 
Description Additional file 1. Workflow of Fig. 2 used to generate the data shown in Fig. 1. 
Type Of Material Database/Collection of data 
Year Produced 2017 
Provided To Others? Yes  
Title MOESM1 of Analysis of drugâ endogenous human metabolite similarities in terms of their maximum common substructures 
Description Additional file 1. Workflow of Fig. 2 used to generate the data shown in Fig. 1. 
Type Of Material Database/Collection of data 
Year Produced 2017 
Provided To Others? Yes  
Title MOESM3 of Analysis of drugâ endogenous human metabolite similarities in terms of their maximum common substructures 
Description Additional file 3. Comparison of endogenites with endogenites in terms of their maximum common substructures. 
Type Of Material Database/Collection of data 
Year Produced 2017 
Provided To Others? Yes  
Title MOESM3 of Analysis of drugâ endogenous human metabolite similarities in terms of their maximum common substructures 
Description Additional file 3. Comparison of endogenites with endogenites in terms of their maximum common substructures. 
Type Of Material Database/Collection of data 
Year Produced 2017 
Provided To Others? Yes  
Title MOESM4 of Analysis of drugâ endogenous human metabolite similarities in terms of their maximum common substructures 
Description Additional file 4. Comparison of marketed drugs with marketed drugs in terms of their maximum common substructures. 
Type Of Material Database/Collection of data 
Year Produced 2017 
Provided To Others? Yes  
Title MOESM4 of Analysis of drugâ endogenous human metabolite similarities in terms of their maximum common substructures 
Description Additional file 4. Comparison of marketed drugs with marketed drugs in terms of their maximum common substructures. 
Type Of Material Database/Collection of data 
Year Produced 2017 
Provided To Others? Yes  
Title MOESM5 of Analysis of drugâ endogenous human metabolite similarities in terms of their maximum common substructures 
Description Additional file 5. Comparison of endogenites with marketed drugs in terms of their maximum common substructures. 
Type Of Material Database/Collection of data 
Year Produced 2017 
Provided To Others? Yes  
Title MOESM5 of Analysis of drugâ endogenous human metabolite similarities in terms of their maximum common substructures 
Description Additional file 5. Comparison of endogenites with marketed drugs in terms of their maximum common substructures. 
Type Of Material Database/Collection of data 
Year Produced 2017 
Provided To Others? Yes  
Title Metabolites data for key species 
Description Chemical data for over 10,000 metabolites from four key species (human, mouse, yeast, and E. coli) have been curated into the ChEBI database, enhancing the value of ChEBI not just to its established user base in the molecular biology community but also to the rapidly growing metabolomics research community. All of the data is freely available. 
Type Of Material Database/Collection of data 
Year Produced 2016 
Provided To Others? Yes  
Impact Significant increase in the number of users of ChEBI from the metabolomics and natural products research communities. 
Title Recon 2.2 
Description Comprehensive update to the highly-cited, high-impact consensus human metabolic reconstruction, Recon 2. Much of the model updates were built on newly introduced algorithms, which we have made publicly available ( 
Type Of Material Computer model/algorithm 
Year Produced 2016 
Provided To Others? Yes  
Impact Following its release in June 2016, Recon 2.2 has been cited 15 times. It has also received user feedback via its GitHub project ( The paper describing the model and its development won Best Paper 2016 in the journal Metabolomics, based on the highest number of accesses during the year. 
Title biochem4j 
Description Graph database covering biochemistry, incorporating data from a number of sources including ChEBI. libChEBI was used in its population. 
Type Of Material Database/Collection of data 
Year Produced 2016 
Provided To Others? Yes  
Impact The database has been recently made publicly available. More impact is expected upon publication (the manuscript is under review with PLOS Computational Biology). 
Title MOESM2 of Analysis of drugâ endogenous human metabolite similarities in terms of their maximum common substructures 
Description Additional file 2. Python code used to generate substructures. 
Type Of Technology Software 
Year Produced 2017 
Title MOESM2 of Analysis of drugâ endogenous human metabolite similarities in terms of their maximum common substructures 
Description Additional file 2. Python code used to generate substructures. 
Type Of Technology Software 
Year Produced 2017 
Title libChEBI 
Description library and API for accessing the ChEBI database and ontology programmatically. Licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International. 
Type Of Technology Software 
Year Produced 2014 
Open Source License? Yes  
Impact None yet, software in testing 
Description A hands-on training course at the Nutrition Society to introduce bioinformatics resources to the nutrition research community (London, 2014) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact A 2-hour hands-on training session on searching and using the ChEBI database and ontology was given as part of a workshop to introduce bioinformatics resources to nutritionists.
Year(s) Of Engagement Activity 2014
Description COMBINE 2015 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact The "Computational Modeling in Biology" Network (COMBINE) is an initiative to coordinate the development of the various community standards and formats in systems biology and related fields. COMBINE is a workshop-style event with oral presentation, posters, and breakout sessions.
Year(s) Of Engagement Activity 2015
Description Chemical Ontologies Meeting, Basel Oct 2015 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Two separate presentations (one by Co-Investigator Christoph Steinbeck, and the other by researcher Gareth Owen) at the Chemical Ontologies meeting, Basel 2/10/2015.
Year(s) Of Engagement Activity 2015
Description Continued development of ChEBI towards better usability for the systems biology and metabolic modelling community 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Presentation discussing the use of ChEBI in metabolic modelling annotation, and the introduction of the libChEBI programming API.
Year(s) Of Engagement Activity 2014
Description Course at Nutritional Society 2015 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Training course about the ChEBI resource given at the Nutrition Society (their headquarters in London) by Dr. Gareth Owen (from the Steinbeck group) on 9/6/2015.
Year(s) Of Engagement Activity 2015
Description EBI Open day 
Form Of Engagement Activity Participation in an open day or visit at my research institution
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Public/other audiences
Results and Impact Poster presentation about the ChEBI resource given at the 2015 EBI Open day.
Year(s) Of Engagement Activity 2015
Description Invited talk at Westphalian University of Applied Sciences 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Inivited presentation about the ChEBI resource given by Co-Investigator Christoph Steinbeck. This was a University seminar.
Year(s) Of Engagement Activity 2015
Description MetaboMeeting 2015 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact poster presentation about ChEBI at the MetaboMeeting 2015, that took place in Cambdrige 7-9 Dec. 2015
Year(s) Of Engagement Activity 2015
Description OpenMinTed Kick-Off Meeting, Rhodes 2015 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Presentation by Co-Investigator Christoph Steinbeck about the ChEBI resource given at the kickoff meeting of the EC H2020 consortium OpenMinTed. This took place in Rhodes on 18/6/2015
Year(s) Of Engagement Activity 2015
Description Poster presentation at the 5th UK Ontology Network Meeting (Newcastle upon Tyne, April 2016). 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact A poster presentation at the major UK ontology meeting describing the ChEBI database and ontology
Year(s) Of Engagement Activity 2016
Description Presentation of developments in the ChEBI Ontology at the American Chemical Society Spring Meeting (March 2016) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Follow-up session to the Chemical Ontologies Workshop in Basel (October 2015)
Year(s) Of Engagement Activity 2016
Description Standardisation of stoichiometric models: how and why 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Presentation in the workshop, Stoichiometric modelling (SM) of microbial metabolism, performed at the Isaac Newton Institute, Cambridge, on 4 November 2014. Introduction to a broad audience on the importance of model annotation, especially in the context of microbial community modelling. For many, this was their first introduction to the subject.
Year(s) Of Engagement Activity 2014
Description libChEBI poster presentation at ICSB 2015 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Poster presentation of the software libChEBI at the International Conference on Systems Biology, Singapore, November 2015
Year(s) Of Engagement Activity 2015