Continued development of ChEBI towards better usability for the systems biology and metabolic modelling community
Lead Research Organisation:
University of Manchester
Department Name: Computer Science
Abstract
After a century of studying nature in greater and greater detail, generating the "parts list" of the molecular components within the cell, the biological sciences have undergone a paradigm shift in the last decade, moving towards putting together these individual molecular pieces to understand their interactions in a holistic context. It is these interactions which give rise to overall cellular processes, and their study has has been termed systems biology.
Systems biology brings together a wide range of information about cells, genes and proteins, as well as the small molecules that act on and within these biological structures. In the service of its application areas, such as drug discovery and industrial biotechnology, it gives a holistic perspective aiming to track and eventually simulate the entire functioning of biological systems. In order to build up such holistic models from such a vast collection of diverse data, integration of individual units of information from many diverse databases needs to be performed. This integration of such a high volume of data can only feasibly be performed computationally. To facilitate smooth integration, individual molecular components within the cellular system require stable and unique identifiers. These identifiers are assigned to entities such as genes, proteins or small molecules by standardization bodies and database providers, and effectively allow the molecular parts list to be catalogued. In addition to this, human-relevant information such as names and chemical and biological structures, relationships and properties are also associated with the various entities in the databases, providing resources that are useable by both software tools and researchers themselves.
The database Chemical Entities of Biological Interest (ChEBI) acts as a resource for such information and stable identifiers in the area of small molecules of biological interest. ChEBI provides for the bioscientific community semantic, biological and chemical information as well as stable identifiers for small chemical compounds relevant in biology, including the so-called metabolites. Metabolites are small molecules in organisms that are implicated in diverse processes including supplying the body with energy, serving as building blocks for tissue, and acting as a defence or as a signal within the organism or between organisms. For these purposes, ChEBI is widely used in the bioscience community, which sends formal requests for the assignment of identifiers for particular small molecule entities to the ChEBI team, who then perform the assignment, publish the information into the public domain and inform the requesting party that the request has been fulfilled.
The aim of the current proposal is to further develop the ChEBI resource and create surrounding tools towards comprehensively addressing the chemical informatics (software and data) needs of the systems biology and metabolic modelling communities, so that they in turn can further their objective to create meaningful simulations and models that enable whole-systems research into pressing public health and energy challenges. In order to facilitate this use, we propose to:
1. Develop a comprehensive software library for accessing ChEBI programmatically which will work across all major available operating systems;
2. Extend the ChEBI database resource to enhance stability, increase community involvement, add additional biologically relevant relationships, and provide a new powerful visualisation for the biological context of molecular entities;
3. Curate into ChEBI all known metabolites across important organisms in systems biology studies: human, mouse, E. coli and yeast.
4. Create new training materials and delivery of training courses to the community.
Systems biology brings together a wide range of information about cells, genes and proteins, as well as the small molecules that act on and within these biological structures. In the service of its application areas, such as drug discovery and industrial biotechnology, it gives a holistic perspective aiming to track and eventually simulate the entire functioning of biological systems. In order to build up such holistic models from such a vast collection of diverse data, integration of individual units of information from many diverse databases needs to be performed. This integration of such a high volume of data can only feasibly be performed computationally. To facilitate smooth integration, individual molecular components within the cellular system require stable and unique identifiers. These identifiers are assigned to entities such as genes, proteins or small molecules by standardization bodies and database providers, and effectively allow the molecular parts list to be catalogued. In addition to this, human-relevant information such as names and chemical and biological structures, relationships and properties are also associated with the various entities in the databases, providing resources that are useable by both software tools and researchers themselves.
The database Chemical Entities of Biological Interest (ChEBI) acts as a resource for such information and stable identifiers in the area of small molecules of biological interest. ChEBI provides for the bioscientific community semantic, biological and chemical information as well as stable identifiers for small chemical compounds relevant in biology, including the so-called metabolites. Metabolites are small molecules in organisms that are implicated in diverse processes including supplying the body with energy, serving as building blocks for tissue, and acting as a defence or as a signal within the organism or between organisms. For these purposes, ChEBI is widely used in the bioscience community, which sends formal requests for the assignment of identifiers for particular small molecule entities to the ChEBI team, who then perform the assignment, publish the information into the public domain and inform the requesting party that the request has been fulfilled.
The aim of the current proposal is to further develop the ChEBI resource and create surrounding tools towards comprehensively addressing the chemical informatics (software and data) needs of the systems biology and metabolic modelling communities, so that they in turn can further their objective to create meaningful simulations and models that enable whole-systems research into pressing public health and energy challenges. In order to facilitate this use, we propose to:
1. Develop a comprehensive software library for accessing ChEBI programmatically which will work across all major available operating systems;
2. Extend the ChEBI database resource to enhance stability, increase community involvement, add additional biologically relevant relationships, and provide a new powerful visualisation for the biological context of molecular entities;
3. Curate into ChEBI all known metabolites across important organisms in systems biology studies: human, mouse, E. coli and yeast.
4. Create new training materials and delivery of training courses to the community.
Technical Summary
The development of genome-scale metabolic reconstructions -- all-encompassing interlinked maps of all known metabolic reaction pathways for a given organism -- requires integrated computable knowledge about the biochemical entities involved, such as macromolecules and metabolites. The use of unambiguous, semantically typed, publicly available and perennial identifiers for model components is becoming increasingly recognised as being essential if systems biology models are to be shared, reused and developed by communities, maximising their benefit. Such annotation allows a wealth of chem- and bioinformatics data, such as chemical structures and protein sequences, to be immediately extracted from these resources via web service interfaces. As the BioModels Database currently contains thousands of models that use ChEBI identifiers, improvements to and extensions of the ChEBI resource will have an immediate benefit to the systems biology modelling community.
We propose here to undertake three such key improvements. Firstly, we will develop a comprehensive cross-platform API library for accessing ChEBI programmatically. The API, libChEBI, will be made publicly available as an open source library in Java and Python. It will include facilities such as extracting biologically relevant groups of compounds (such as tautomers), calculating additional physicochemical properties and semantic reasoning over model annotations. Secondly, we will extend the ChEBI database to enhance stability, increase community involvement and provide a new powerful visualisation for the biological context of molecular entities; Thirdly, we will curate into ChEBI all known metabolites across human, mouse, E. coli and S. cerevisiae.
We propose here to undertake three such key improvements. Firstly, we will develop a comprehensive cross-platform API library for accessing ChEBI programmatically. The API, libChEBI, will be made publicly available as an open source library in Java and Python. It will include facilities such as extracting biologically relevant groups of compounds (such as tautomers), calculating additional physicochemical properties and semantic reasoning over model annotations. Secondly, we will extend the ChEBI database to enhance stability, increase community involvement and provide a new powerful visualisation for the biological context of molecular entities; Thirdly, we will curate into ChEBI all known metabolites across human, mouse, E. coli and S. cerevisiae.
Planned Impact
The programme of work described in this proposal will continue developing and maintaining ChEBI, an extensive chemical data resource already widely used by the chemistry and biology communities. The work focuses on the extension of the scope of the resource, and on a significant improvement in the computational usability of the database through the proposed libChEBI API. While a number of proposed improvements are focused towards the systems biology modelling communities, generic improvements to the resource's coverage of the chemical space, expanded definition of the relationships between chemicals, and the introduction of a robust programming library to access this rich ontology will produce a wide range of benefits for researchers in any field with an interest in (bio)chemistry and the development of software that underpins research in the area. The models and simulations of biological systems being developed by the systems biology and modelling community that will directly benefit from the proposed work filter into all life-science related industry, such as biotechnology, pharmaceutical, consumer goods, nutrition and health technology. All of these industries are increasingly relying on systems approaches and computational modelling in their research strategy, as documented in our supporting letters from Syngenta, Novartis and Unilever. In agriculture, the issue of efficient food production is also likely to benefit from modelling and simulation, for example by using models to understand how crop yield can be maximised. Designing crop yield for food security relies on metabolic analysis and therefore this resource has a direct impact on that activity.
ChEBI is becoming a more and more complete "textbook" for metabolism for the wider public. It has recently extended the usefulness of its pages by incorporating Wikipedia links together with descriptive textual extracts into the main entity pages. We provide an Entity of the Month with our monthly release as a method for engaging the wider public with the broader reach of high-quality annotated data. Annotated data and modelling are also becoming crucial for a personalised approach to healthcare, where models will be developed and calibrated for each person as a basis for rationalising therapies, nutrition and exercise regimes, etc. Improving the accuracy of modelling and simulation approaches as will be enabled by the software library and semantic reasoning we propose here will be a major factor in reducing the number of animal experiments carried out, which will be a generic benefit to a humane society. Therefore beneficiaries will be:
a) individual companies in the life-science area and their employees and
b) non-profit and commercial agricultural industry and their employees who use modelling as a research tool in their business;
c) health-related industries and practicing clinics applying personalised medicine; and finally
d) the public at large who will benefit from the products of all those organizations mentioned.
Thus this software resource will have significant consequences to society and human well-being. To facilitate the delivery of these benefits to industries and to the public at large, we will publicise our research widely in open access publications, create online impact using Twitter and by writing about our developments in grants, and deliver training courses to the UK community as an integral part of the achievement of our objectives.
ChEBI is becoming a more and more complete "textbook" for metabolism for the wider public. It has recently extended the usefulness of its pages by incorporating Wikipedia links together with descriptive textual extracts into the main entity pages. We provide an Entity of the Month with our monthly release as a method for engaging the wider public with the broader reach of high-quality annotated data. Annotated data and modelling are also becoming crucial for a personalised approach to healthcare, where models will be developed and calibrated for each person as a basis for rationalising therapies, nutrition and exercise regimes, etc. Improving the accuracy of modelling and simulation approaches as will be enabled by the software library and semantic reasoning we propose here will be a major factor in reducing the number of animal experiments carried out, which will be a generic benefit to a humane society. Therefore beneficiaries will be:
a) individual companies in the life-science area and their employees and
b) non-profit and commercial agricultural industry and their employees who use modelling as a research tool in their business;
c) health-related industries and practicing clinics applying personalised medicine; and finally
d) the public at large who will benefit from the products of all those organizations mentioned.
Thus this software resource will have significant consequences to society and human well-being. To facilitate the delivery of these benefits to industries and to the public at large, we will publicise our research widely in open access publications, create online impact using Twitter and by writing about our developments in grants, and deliver training courses to the UK community as an integral part of the achievement of our objectives.
Publications

O Hagan S
(2015)
A 'rule of 0.5' for the metabolite-likeness of approved pharmaceutical drugs.
in Metabolomics : Official journal of the Metabolomic Society

Swainston N
(2013)
An analysis of a 'community-driven' reconstruction of the human metabolic network.
in Metabolomics : Official journal of the Metabolomic Society

O'Hagan S
(2017)
Analysis of drug-endogenous human metabolite similarities in terms of their maximum common substructures.
in Journal of cheminformatics

Moreno P
(2015)
BiNChE: a web tool and library for chemical enrichment analysis based on the ChEBI ontology.
in BMC bioinformatics

Swainston N
(2017)
biochem4j: Integrated and extensible biochemical knowledge through graph databases.
in PloS one

Hastings J
(2016)
ChEBI in 2016: Improved services and an expanding collection of metabolites.
in Nucleic acids research



Mendes P
(2015)
Fitting Transporter Activities to Cellular Drug Concentrations and Fluxes: Why the Bumblebee Can Fly.
in Trends in pharmacological sciences

Millard P
(2015)
Impact of kinetic isotope effects in isotopic studies of metabolic systems.
in BMC systems biology

Swainston N
(2016)
libChEBI: an API for accessing the ChEBI database.
in Journal of cheminformatics

Millard P
(2017)
Metabolic regulation is sufficient for global and robust coordination of glucose uptake, catabolism, energy production and growth in Escherichia coli.
in PLoS computational biology

O'Hagan S
(2016)
MetMaxStruct: A Tversky-Similarity-Based Strategy for Analysing the (Sub)Structural Similarities of Drugs and Endogenous Metabolites.
in Frontiers in pharmacology

Swainston N
(2016)
Recon 2.2: from reconstruction to model of human metabolism.
in Metabolomics : Official journal of the Metabolomic Society

Stanford NJ
(2015)
RobOKoD: microbial strain design for (over)production of target compounds.
in Frontiers in cell and developmental biology

Swainston N
(2018)
STRENDA DB: enabling the validation and sharing of enzyme kinetics data.
in The FEBS journal
Description | This research grant supports the bioinformatics resource ChEBI and its development towards aiding systems biology. The resource has been heavily used by the research community in developing maps of human metabolism and also by those developing whole-cell models. The resource is thus increasingly becoming essential for systems biology efforts. |
Exploitation Route | The ChEBI resource is of great importance in referring to molecules and defining exactly what they are. ChEBI is being used heavily in reconstructions of metabolism. However it is also being used more widely; an example is that all molecule entries in Wikipedia have a reference back to that molecule page in ChEBI, so it has a wider impact on all society. |
Sectors | Agriculture, Food and Drink,Chemicals,Digital/Communication/Information Technologies (including Software),Education,Energy,Environment,Healthcare,Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology |
Description | ChEBI is a resource widely used by several industries, such as healthcare, pharma, consumer product industry, etc. ChEBI is also used by Wikipedia to provide an authoritative reference to chemicals of biological interest. |
Sector | Agriculture, Food and Drink,Chemicals,Digital/Communication/Information Technologies (including Software),Education,Energy,Environment,Healthcare,Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology,Other |
Impact Types | Cultural,Economic |
Title | ChEBI |
Description | ChEBI is a database of chemical entities of biological interest and an ontology. |
Type Of Material | Database/Collection of data |
Year Produced | 2006 |
Provided To Others? | Yes |
Impact | this is one of the most widely used source for data on biochemical molecules. It has become the main source of annotation of such molecules in systems biology models. The improvements carried out this year include 1) Improved pre-loading of data with partial automation of structure-based classification and 2) Improved release cycle: it is no longer on a monthly release but in an "near live" mode, with both new entries and updates to existing entries appearing on the website within a second of them having been curated. The web service (for programmatic access to the data) has also been updated to near live updates, while for those users who prefer stable monthly releases the downloadable files are still created monthly. |
URL | http://www.ebi.ac.uk/chebi/ |
Title | Metabolites data for key species |
Description | Chemical data for over 10,000 metabolites from four key species (human, mouse, yeast, and E. coli) have been curated into the ChEBI database, enhancing the value of ChEBI not just to its established user base in the molecular biology community but also to the rapidly growing metabolomics research community. All of the data is freely available. |
Type Of Material | Database/Collection of data |
Year Produced | 2016 |
Provided To Others? | Yes |
Impact | Significant increase in the number of users of ChEBI from the metabolomics and natural products research communities. |
URL | http://www.ebi.ac.uk/chebi/ |
Title | Recon 2.2 |
Description | Comprehensive update to the highly-cited, high-impact consensus human metabolic reconstruction, Recon 2. Much of the model updates were built on newly introduced algorithms, which we have made publicly available (https://github.com/mcisb/mcisb-recon). |
Type Of Material | Computer model/algorithm |
Year Produced | 2016 |
Provided To Others? | Yes |
Impact | Following its release in June 2016, Recon 2.2 has been cited 15 times. It has also received user feedback via its GitHub project (https://github.com/mcisb/mcisb-recon). The paper describing the model and its development won Best Paper 2016 in the journal Metabolomics, based on the highest number of accesses during the year. |
URL | http://link.springer.com/article/10.1007/s11306-016-1051-4 |
Title | biochem4j |
Description | Graph database covering biochemistry, incorporating data from a number of sources including ChEBI. libChEBI was used in its population. |
Type Of Material | Database/Collection of data |
Year Produced | 2016 |
Provided To Others? | Yes |
Impact | The database has been recently made publicly available. More impact is expected upon publication (the manuscript is under review with PLOS Computational Biology). |
URL | http://biochem4j.org |
Title | libChEBI |
Description | library and API for accessing the ChEBI database and ontology programmatically. Licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International. |
Type Of Technology | Software |
Year Produced | 2014 |
Open Source License? | Yes |
Impact | None yet, software in testing |
URL | http://svn.code.sf.net/p/mcisb/code/libChEBI/ |
Description | A hands-on training course at the Nutrition Society to introduce bioinformatics resources to the nutrition research community (London, 2014) |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | A 2-hour hands-on training session on searching and using the ChEBI database and ontology was given as part of a workshop to introduce bioinformatics resources to nutritionists. |
Year(s) Of Engagement Activity | 2014 |
Description | COMBINE 2015 |
Form Of Engagement Activity | A formal working group, expert panel or dialogue |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | The "Computational Modeling in Biology" Network (COMBINE) is an initiative to coordinate the development of the various community standards and formats in systems biology and related fields. COMBINE is a workshop-style event with oral presentation, posters, and breakout sessions. |
Year(s) Of Engagement Activity | 2015 |
URL | http://co.mbine.org/events/COMBINE_2015 |
Description | Chemical Ontologies Meeting, Basel Oct 2015 |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Two separate presentations (one by Co-Investigator Christoph Steinbeck, and the other by researcher Gareth Owen) at the Chemical Ontologies meeting, Basel 2/10/2015. |
Year(s) Of Engagement Activity | 2015 |
Description | Continued development of ChEBI towards better usability for the systems biology and metabolic modelling community |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Postgraduate students |
Results and Impact | Presentation discussing the use of ChEBI in metabolic modelling annotation, and the introduction of the libChEBI programming API. |
Year(s) Of Engagement Activity | 2014 |
Description | Course at Nutritional Society 2015 |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Professional Practitioners |
Results and Impact | Training course about the ChEBI resource given at the Nutrition Society (their headquarters in London) by Dr. Gareth Owen (from the Steinbeck group) on 9/6/2015. |
Year(s) Of Engagement Activity | 2015 |
URL | http://www.nutritionsociety.org/training-and-education/bioinformatics-nutritionists/programme |
Description | EBI Open day |
Form Of Engagement Activity | Participation in an open day or visit at my research institution |
Part Of Official Scheme? | No |
Geographic Reach | Regional |
Primary Audience | Public/other audiences |
Results and Impact | Poster presentation about the ChEBI resource given at the 2015 EBI Open day. |
Year(s) Of Engagement Activity | 2015 |
URL | http://www.ebi.ac.uk/about/events/2015/embl-ebi-open-day-2015 |
Description | Invited talk at Westphalian University of Applied Sciences |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Inivited presentation about the ChEBI resource given by Co-Investigator Christoph Steinbeck. This was a University seminar. |
Year(s) Of Engagement Activity | 2015 |
Description | MetaboMeeting 2015 |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | poster presentation about ChEBI at the MetaboMeeting 2015, that took place in Cambdrige 7-9 Dec. 2015 |
Year(s) Of Engagement Activity | 2015 |
URL | http://selectbiosciences.com/conferences/index.aspx?conf=Metabo2015 |
Description | OpenMinTed Kick-Off Meeting, Rhodes 2015 |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Presentation by Co-Investigator Christoph Steinbeck about the ChEBI resource given at the kickoff meeting of the EC H2020 consortium OpenMinTed. This took place in Rhodes on 18/6/2015 |
Year(s) Of Engagement Activity | 2015 |
Description | Poster presentation at the 5th UK Ontology Network Meeting (Newcastle upon Tyne, April 2016). |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | A poster presentation at the major UK ontology meeting describing the ChEBI database and ontology |
Year(s) Of Engagement Activity | 2016 |
URL | https://conferences.ncl.ac.uk/ukon2016/programme/ |
Description | Presentation of developments in the ChEBI Ontology at the American Chemical Society Spring Meeting (March 2016) |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Follow-up session to the Chemical Ontologies Workshop in Basel (October 2015) |
Year(s) Of Engagement Activity | 2016 |
URL | https://www.acs.org/content/dam/acsorg/meetings/spring-2016/attendee-services/onsite-program-noads.p... |
Description | Standardisation of stoichiometric models: how and why |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Presentation in the workshop, Stoichiometric modelling (SM) of microbial metabolism, performed at the Isaac Newton Institute, Cambridge, on 4 November 2014. Introduction to a broad audience on the importance of model annotation, especially in the context of microbial community modelling. For many, this was their first introduction to the subject. |
Year(s) Of Engagement Activity | 2014 |
Description | libChEBI poster presentation at ICSB 2015 |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Poster presentation of the software libChEBI at the International Conference on Systems Biology, Singapore, November 2015 |
Year(s) Of Engagement Activity | 2015 |
URL | http://icsb15.apbionet.org/ |