Re-engineering ChEBI for a sustainable future

Lead Research Organisation: European Bioinformatics Institute

Department Name: MSCB Macromolec, structural and chem bio

Abstract

Endogenous small molecules play important roles in regulating complex biological processes and, thus, life itself. Small molecules also serve as powerful tools, with wide-ranging applications in medicine (i.e. as drugs), the biological sciences and biotechnology. An ever-increasing number of novel compounds with a wide range of interesting and potentially useful properties are being identified from sources such as plants, fungi and microorganisms.

Small molecules are thus clearly of critical interest to the scientific community. However, many biologists lack the detailed expertise and knowledge to fully understand and appreciate the many complex and subtle aspects of small molecules, and in particular the many nuances associated with the accurate representation of chemical structures. A further challenge is that the same small molecule will often be referenced by multiple names and synonyms in the scientific literature and in databases. To take one very simple example, the non-steroidal anti-inflammatory drug aspirin is also referred to as acetylsalicylic acid, 2-(acetyloxy)benzoic acid and o-acetylsalicylic acid among many other synonyms. This complexity and ambiguity is a significant obstacle and can lead to wasted effort, inaccurate results and misleading conclusions. The Chemical Entities of Biological Interest (ChEBI) database acts as a reliable and trusted resource that provides "definitive" information about small molecules, thereby delivering a solution to many of these challenges. ChEBI provides biological, chemical and semantic information for small chemical compounds relevant in biology to the community. ChEBI also creates for each distinct molecular structure a stable and unchanging identifier, which is used by multiple other resources to definitively identify that specific compound, much as a grid reference unambiguously identifies a specific location on the earth's surface. In addition, ChEBI incorporates standard naming systems from global bodies such as the International Union of Pure and Applied Chemistry (IUPAC) and the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology (NC-IUBMB). All of the information and data in ChEBI is freely available and downloadable without restriction. For these reasons, ChEBI is very widely used as a small molecule reference database by a number of leading biomedical databases. ChEBI is also used by a very large number of users who access its information via the public web site.

The aim of our proposal is to ensure the continued availability and growth of this critical resource for the bioscience community. ChEBI was originally developed in 2004 and as a consequence its underlying computer code is now out-of-date and increasingly difficult to support and maintain. Indeed, there is a growing risk that it will in the near future become incompatible with current computer systems. We therefore propose to completely overhaul and modernise the ChEBI infrastructure, code base and associated software tools. A new user-friendly website will be developed which will enable users to search, retrieve and download data. Advanced users will benefit from the superior programmatic access mechanisms to the data. We will develop a new annotation, curation and submission tool that will improve the overall efficiency of our expert ChEBI curators, for example by automating a number of currently time-consuming manual processes. This will reduce the time and effort required to create new entries. This tool will also benefit users who submit entries to ChEBI by significantly streamlining the submissions process. Our project will enable ChEBI to benefit from recent advances in software development techniques and deliver the new infrastructure platform, critical to enabling ChEBI to continue to fulfil the critical role it plays in the global bioinformatics community.

Technical Summary

ChEBI is a database and ontology containing information about chemical entities of biological interest. It is widely used as a 'small molecule' reference database by a number of leading global resources such as Gene Ontology, UniProt and Rhea, providing identifiers, structures and annotations to enable chemical entities to be unambiguously identified within biological databases, ontologies, models and the literature. ChEBI is also widely used through its public website and API as a rich source of information about small molecules. ChEBI is curated by human experts, and provides a reliable, non-redundant collection of chemical entities and related data such as detailed structure, synonyms, chemical formula, charge, molecular mass and links to external databases. Furthermore, ChEBI also contains an extensive ontology which enables the relationships between chemical entities to be defined on the basis of their shared chemical structure features together with their biological properties and roles.

Since its creation in 2004, ChEBI's software infrastructure has not undergone any major enhancements and is now significantly outdated, resulting in a large and growing maintenance burden. The overall goal of our project is to completely overhaul and modernise ChEBI's software infrastructure to enable ChEBI to continue to provide its critical service to the bioscience community. The work will be divided into four distinct work packages covering (1) the core database and web services, (2) more powerful and scalable searching capabilities using elastic and RDkit, (3) a new web interface and ontology visualisation tool and (4) a new suite of curator tools that will improve efficiency and enable a wider pool of curators to contribute to ChEBI. Documentation and training will be developed to enable users to benefit from these developments which will not only impact on ChEBI itself but also on a multitude of other global bioinformatics resources.

Funded Value:

£743,589

Funded Period:

May 22 - May 25

Funder:

BBSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

BB/V018566/1

Principal Investigator:

Sameer Velankar

Noel Michael O'Boyle

Andrew Leach

Research Subject:

Biomolecules & biochemistry (10%)

Info. & commun. Technol. (10%)

Tools, technologies & methods (80%)

Research Topic:

Analytical Science (20%)

Bioinformatics (40%)

Chemical Biology (10%)

Software Engineering (10%)

eScience (20%)

Organisations

People	ORCID iD
Sameer Velankar (Principal Investigator)	http://orcid.org/0000-0002-8439-5964
Noel Michael O'Boyle (Principal Investigator)	http://orcid.org/0000-0003-4879-2003
Andrew Leach (Principal Investigator)

Publications

Author Name

Title Publication Date Published

10 25 50

Andrés-Hernández L (2022) Establishing a Common Nutritional Vocabulary - From Food Production to Diet. in Frontiers in nutrition

Martinez K (2024) Functional implications of glycans and their curation: insights from the workshop held at the 16th Annual International Biocuration Conference in Padua, Italy. in Database : the journal of biological databases and curation

Ni Z (2023) Guiding the choice of informatics software and tools for lipidomics research applications. in Nature methods

Witting M (2024) Challenges and perspectives for naming lipids in the context of lipidomics. in Metabolomics : Official journal of the Metabolomic Society

Research Databases and Models
Collaboration
Engagement Activities


Title	ChEBI
Description	Chemical Entities of Biological Interest (ChEBI) is a freely available dictionary of molecular entities focused on 'small' chemical compounds, which are either products of nature or synthetic products used to intervene in the processes of living organisms. ChEBI incorporates an ontological classification, whereby the relationships between molecular entities or classes of entities and their parents and/or children are specified.
Type Of Material	Database/Collection of data
Provided To Others?	Yes
Impact	ChEBI is a key component of multiple global biodata resources, which draw upon various aspects of the database including chemical structures, the ontology, molecule names and stable molecule identifiers.
URL	https://www.ebi.ac.uk/chebi/


Description	EBI communications team collaboration with ChEBI
Organisation	EMBL European Bioinformatics Institute (EMBL - EBI)
Country	United Kingdom
Sector	Academic/University
PI Contribution	We asked the EBI's communications team to redesign the ChEBI website hompage and shared our design requirements and required content.
Collaborator Contribution	New Logos, hero image, SVG icons were designed by EBI communications team for the new ChEBI website.
Impact	The new ChEBI website has now been rebranded.
Start Year	2023


Description	OBO community collaboration
Organisation	Northeastern University - Boston
Country	United States
Sector	Academic/University
PI Contribution	Integrated RO and ChemROF into the new ChEBI ontology.
Collaborator Contribution	Gave valuable suggestions and helped improve the ChEBI ontology (Created pull requests on GitLab)
Impact	The relationships in the ChEBI ontology are now alligned with RO, and the annotation to ChemROF
Start Year	2023


Description	Visual framework collaboration
Organisation	EMBL European Bioinformatics Institute (EMBL - EBI)
Country	United Kingdom
Sector	Academic/University
PI Contribution	We were one of the early users of this framework.
Collaborator Contribution	EBI developed a open-source toolkit for life science websites and is responsible for its maintanance.
Impact	The framework was used to develop ChEBI's new public interface.
Start Year	2023


Description	16th Annual International Biocuration Conference in Padua, Italy
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Participtated in a pre-conference workshop (Functional impact of glycans and their curation) with the glycan community. This involved discussions with how each resource curated glycans and how we can learn from each other. Also presented a poster about ChEBI at the main conference. A conference paper was subsequently written and submitted to the Journal of Glycobiology (pending publication).
Year(s) Of Engagement Activity	2023
URL	https://wiki.glygen.org/Glycan_Function_Workshop_2023


Description	19th Annual conference of the Metabolomics Society (Metabolomics 2023 in Niagara Falls, Canada)
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Presented a poster about ChEBI and involved with MetaboLights workshop (panel discussion).
Year(s) Of Engagement Activity	2023
URL	https://www.metabolomics2023.org/


Description	2nd Ontologies4Chem Workshop
Form Of Engagement Activity	A formal working group, expert panel or dialogue
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Invited by NFDI4Chem in 2023, to give a talk about the ChEBI database. The audience was a combination of university researchers, chemists, ontologists etc.
Year(s) Of Engagement Activity	2023
URL	https://www.nfdi4chem.de/event/2nd-ontologies4chem-workshop/


Description	3rd Ontologies4Chem workshop (Online)
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Presented an update on efforts to migrate ChEBI to a more robust infrastructure.
Year(s) Of Engagement Activity	2024
URL	https://www.nfdi4chem.de/3rd-ontologies4chem-workshop-2024/


Description	ChEBI Workshop
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Organised a 1-day workshop at EMBL-EBI (hybrid event, with 24 in-person and 12 remote participants). The primary aim of the ChEBI workshop was to bring together stakeholders from major bioinformatics resources that rely on ChEBI, provide updates on ChEBI's redevelopment, receive feedback, and solicit further input.
Year(s) Of Engagement Activity	2024
URL	https://drive.google.com/drive/folders/1XMlzZFXXm7styUhzOBgpWchiwLutQj4f


Description	ChEMBL & SureChEMBL anniversary symposium (EMBL-EBI)
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Presented a poster highlighting ChEBI's redevelopment
Year(s) Of Engagement Activity	2024
URL	https://www.eventsforce.net/embl/frontend/reg/thome.csp?pageID=96136&eventID=151&traceRedir=2


Description	Chemistry Day (EMBL-EBI)
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Local
Primary Audience	Professional Practitioners
Results and Impact	Presented a talk about ChEBI and future plans at the Chemistry Day.
Year(s) Of Engagement Activity	2023
URL	https://docs.google.com/document/d/1xbY9Qj8rndiIdfkKIb0Mk1FByGZILX8rur-bo_yTCVc/edit?tab=t.0#heading...


Description	GCBR of the week social media campaign
Form Of Engagement Activity	Engagement focused website, blog or social media channel
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Media (as a channel to the public)
Results and Impact	ChEBI was featured in the GCBR of the week social media campaign, giving the general public/scientific community the chance to learn more about ChEBI. ChEBI's profile and animation was published on LinkedIn and X.
Year(s) Of Engagement Activity	2024
URL	https://www.linkedin.com/pulse/chebi-global-biodata-coalition-jv3ae


Description	IUPAC WorldFAIR Chemistry Webinar (Online)
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Presented a flash talk about ChEBI to the general chemistry community and participated in a panel discussion. The audience was a combination of researchers, publishers, data community, etc.
Year(s) Of Engagement Activity	2023
URL	https://zenodo.org/records/7683072


Description	MetFAIR working group (Online)
Form Of Engagement Activity	A formal working group, expert panel or dialogue
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Presented a talk about ChEBI at the MetFAIR - Reprodicible Reporting and Metabolite Annotation Task Group. The task group plans to write a white paper around metabolite annotation.
Year(s) Of Engagement Activity	2025
URL	https://metabolomicssociety.org/board-committees/scientific-task-groups/


Description	OBO Foundry Newsletter Issue 7
Form Of Engagement Activity	A magazine, newsletter or online publication
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Wrote a summary of the outcomes from the ChEBI workshop to inform/update the OBO (Open Biological and Biomedical Ontology) community.
Year(s) Of Engagement Activity	2025
URL	http://obofoundry.org/newsletter/2025/01/23/7th-issue-newsletter.html


Description	Ontologies4Chem workshop (Online)
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Invited by NFDI4Chem to give a talk about the ChEBI database. The audience was a combination of university researchers, chemists, ontologists etc.
Year(s) Of Engagement Activity	2022
URL	https://www.nfdi4chem.de/event/ontologies4chem-workshop/


Description	Studying metabolites and small molecules with MetaboLights and ChEBI (EMBL-EBI online webinar)
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Presented a talk about ChEBI. The webinar was part of the molecular building blocks of life series open to anyone who was interested in studying small molecules.
Year(s) Of Engagement Activity	2023
URL	https://www.ebi.ac.uk/training/events/studying-metabolites-and-small-molecules-metabolights-and-cheb...


Description	Transformation workflow using dbt for ChEBI Database (EMBL's Scientific Workflow Club)
Form Of Engagement Activity	A formal working group, expert panel or dialogue
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Presented a talk about the new database managememnt strategy for the new ChEBI and the great results we are getting using it.
Year(s) Of Engagement Activity	2024
URL	https://oc.embl.de/index.php/s/LcMIsohL6XOgApR#/files_mediaviewer/2024-07-04-Transformation-workflow...

Abstract

Technical Summary

Organisations

People

ORCID iD

Publications