Re-engineering ChEBI for a sustainable future
Lead Research Organisation:
European Bioinformatics Institute
Department Name: MSCB Macromolec, structural and chem bio
Abstract
Endogenous small molecules play important roles in regulating complex biological processes and, thus, life itself. Small molecules also serve as powerful tools, with wide-ranging applications in medicine (i.e. as drugs), the biological sciences and biotechnology. An ever-increasing number of novel compounds with a wide range of interesting and potentially useful properties are being identified from sources such as plants, fungi and microorganisms.
Small molecules are thus clearly of critical interest to the scientific community. However, many biologists lack the detailed expertise and knowledge to fully understand and appreciate the many complex and subtle aspects of small molecules, and in particular the many nuances associated with the accurate representation of chemical structures. A further challenge is that the same small molecule will often be referenced by multiple names and synonyms in the scientific literature and in databases. To take one very simple example, the non-steroidal anti-inflammatory drug aspirin is also referred to as acetylsalicylic acid, 2-(acetyloxy)benzoic acid and o-acetylsalicylic acid among many other synonyms. This complexity and ambiguity is a significant obstacle and can lead to wasted effort, inaccurate results and misleading conclusions. The Chemical Entities of Biological Interest (ChEBI) database acts as a reliable and trusted resource that provides "definitive" information about small molecules, thereby delivering a solution to many of these challenges. ChEBI provides biological, chemical and semantic information for small chemical compounds relevant in biology to the community. ChEBI also creates for each distinct molecular structure a stable and unchanging identifier, which is used by multiple other resources to definitively identify that specific compound, much as a grid reference unambiguously identifies a specific location on the earth's surface. In addition, ChEBI incorporates standard naming systems from global bodies such as the International Union of Pure and Applied Chemistry (IUPAC) and the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology (NC-IUBMB). All of the information and data in ChEBI is freely available and downloadable without restriction. For these reasons, ChEBI is very widely used as a small molecule reference database by a number of leading biomedical databases. ChEBI is also used by a very large number of users who access its information via the public web site.
The aim of our proposal is to ensure the continued availability and growth of this critical resource for the bioscience community. ChEBI was originally developed in 2004 and as a consequence its underlying computer code is now out-of-date and increasingly difficult to support and maintain. Indeed, there is a growing risk that it will in the near future become incompatible with current computer systems. We therefore propose to completely overhaul and modernise the ChEBI infrastructure, code base and associated software tools. A new user-friendly website will be developed which will enable users to search, retrieve and download data. Advanced users will benefit from the superior programmatic access mechanisms to the data. We will develop a new annotation, curation and submission tool that will improve the overall efficiency of our expert ChEBI curators, for example by automating a number of currently time-consuming manual processes. This will reduce the time and effort required to create new entries. This tool will also benefit users who submit entries to ChEBI by significantly streamlining the submissions process. Our project will enable ChEBI to benefit from recent advances in software development techniques and deliver the new infrastructure platform, critical to enabling ChEBI to continue to fulfil the critical role it plays in the global bioinformatics community.
Small molecules are thus clearly of critical interest to the scientific community. However, many biologists lack the detailed expertise and knowledge to fully understand and appreciate the many complex and subtle aspects of small molecules, and in particular the many nuances associated with the accurate representation of chemical structures. A further challenge is that the same small molecule will often be referenced by multiple names and synonyms in the scientific literature and in databases. To take one very simple example, the non-steroidal anti-inflammatory drug aspirin is also referred to as acetylsalicylic acid, 2-(acetyloxy)benzoic acid and o-acetylsalicylic acid among many other synonyms. This complexity and ambiguity is a significant obstacle and can lead to wasted effort, inaccurate results and misleading conclusions. The Chemical Entities of Biological Interest (ChEBI) database acts as a reliable and trusted resource that provides "definitive" information about small molecules, thereby delivering a solution to many of these challenges. ChEBI provides biological, chemical and semantic information for small chemical compounds relevant in biology to the community. ChEBI also creates for each distinct molecular structure a stable and unchanging identifier, which is used by multiple other resources to definitively identify that specific compound, much as a grid reference unambiguously identifies a specific location on the earth's surface. In addition, ChEBI incorporates standard naming systems from global bodies such as the International Union of Pure and Applied Chemistry (IUPAC) and the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology (NC-IUBMB). All of the information and data in ChEBI is freely available and downloadable without restriction. For these reasons, ChEBI is very widely used as a small molecule reference database by a number of leading biomedical databases. ChEBI is also used by a very large number of users who access its information via the public web site.
The aim of our proposal is to ensure the continued availability and growth of this critical resource for the bioscience community. ChEBI was originally developed in 2004 and as a consequence its underlying computer code is now out-of-date and increasingly difficult to support and maintain. Indeed, there is a growing risk that it will in the near future become incompatible with current computer systems. We therefore propose to completely overhaul and modernise the ChEBI infrastructure, code base and associated software tools. A new user-friendly website will be developed which will enable users to search, retrieve and download data. Advanced users will benefit from the superior programmatic access mechanisms to the data. We will develop a new annotation, curation and submission tool that will improve the overall efficiency of our expert ChEBI curators, for example by automating a number of currently time-consuming manual processes. This will reduce the time and effort required to create new entries. This tool will also benefit users who submit entries to ChEBI by significantly streamlining the submissions process. Our project will enable ChEBI to benefit from recent advances in software development techniques and deliver the new infrastructure platform, critical to enabling ChEBI to continue to fulfil the critical role it plays in the global bioinformatics community.
Technical Summary
ChEBI is a database and ontology containing information about chemical entities of biological interest. It is widely used as a 'small molecule' reference database by a number of leading global resources such as Gene Ontology, UniProt and Rhea, providing identifiers, structures and annotations to enable chemical entities to be unambiguously identified within biological databases, ontologies, models and the literature. ChEBI is also widely used through its public website and API as a rich source of information about small molecules. ChEBI is curated by human experts, and provides a reliable, non-redundant collection of chemical entities and related data such as detailed structure, synonyms, chemical formula, charge, molecular mass and links to external databases. Furthermore, ChEBI also contains an extensive ontology which enables the relationships between chemical entities to be defined on the basis of their shared chemical structure features together with their biological properties and roles.
Since its creation in 2004, ChEBI's software infrastructure has not undergone any major enhancements and is now significantly outdated, resulting in a large and growing maintenance burden. The overall goal of our project is to completely overhaul and modernise ChEBI's software infrastructure to enable ChEBI to continue to provide its critical service to the bioscience community. The work will be divided into four distinct work packages covering (1) the core database and web services, (2) more powerful and scalable searching capabilities using elastic and RDkit, (3) a new web interface and ontology visualisation tool and (4) a new suite of curator tools that will improve efficiency and enable a wider pool of curators to contribute to ChEBI. Documentation and training will be developed to enable users to benefit from these developments which will not only impact on ChEBI itself but also on a multitude of other global bioinformatics resources.
Since its creation in 2004, ChEBI's software infrastructure has not undergone any major enhancements and is now significantly outdated, resulting in a large and growing maintenance burden. The overall goal of our project is to completely overhaul and modernise ChEBI's software infrastructure to enable ChEBI to continue to provide its critical service to the bioscience community. The work will be divided into four distinct work packages covering (1) the core database and web services, (2) more powerful and scalable searching capabilities using elastic and RDkit, (3) a new web interface and ontology visualisation tool and (4) a new suite of curator tools that will improve efficiency and enable a wider pool of curators to contribute to ChEBI. Documentation and training will be developed to enable users to benefit from these developments which will not only impact on ChEBI itself but also on a multitude of other global bioinformatics resources.
Publications
Andrés-Hernández L
(2022)
Establishing a Common Nutritional Vocabulary - From Food Production to Diet.
in Frontiers in nutrition
Martinez K
(2024)
Functional implications of glycans and their curation: insights from the workshop held at the 16th Annual International Biocuration Conference in Padua, Italy.
in Database : the journal of biological databases and curation
Ni Z
(2023)
Guiding the choice of informatics software and tools for lipidomics research applications.
in Nature methods
Witting M
(2024)
Challenges and perspectives for naming lipids in the context of lipidomics.
in Metabolomics : Official journal of the Metabolomic Society
| Title | ChEBI |
| Description | Chemical Entities of Biological Interest (ChEBI) is a freely available dictionary of molecular entities focused on 'small' chemical compounds, which are either products of nature or synthetic products used to intervene in the processes of living organisms. ChEBI incorporates an ontological classification, whereby the relationships between molecular entities or classes of entities and their parents and/or children are specified. |
| Type Of Material | Database/Collection of data |
| Provided To Others? | Yes |
| Impact | ChEBI is a key component of multiple global biodata resources, which draw upon various aspects of the database including chemical structures, the ontology, molecule names and stable molecule identifiers. |
| URL | https://www.ebi.ac.uk/chebi/ |
| Description | EBI communications team collaboration with ChEBI |
| Organisation | EMBL European Bioinformatics Institute (EMBL - EBI) |
| Country | United Kingdom |
| Sector | Academic/University |
| PI Contribution | We asked the EBI's communications team to redesign the ChEBI website hompage and shared our design requirements and required content. |
| Collaborator Contribution | New Logos, hero image, SVG icons were designed by EBI communications team for the new ChEBI website. |
| Impact | The new ChEBI website has now been rebranded. |
| Start Year | 2023 |
| Description | OBO community collaboration |
| Organisation | Northeastern University - Boston |
| Country | United States |
| Sector | Academic/University |
| PI Contribution | Integrated RO and ChemROF into the new ChEBI ontology. |
| Collaborator Contribution | Gave valuable suggestions and helped improve the ChEBI ontology (Created pull requests on GitLab) |
| Impact | The relationships in the ChEBI ontology are now alligned with RO, and the annotation to ChemROF |
| Start Year | 2023 |
| Description | Visual framework collaboration |
| Organisation | EMBL European Bioinformatics Institute (EMBL - EBI) |
| Country | United Kingdom |
| Sector | Academic/University |
| PI Contribution | We were one of the early users of this framework. |
| Collaborator Contribution | EBI developed a open-source toolkit for life science websites and is responsible for its maintanance. |
| Impact | The framework was used to develop ChEBI's new public interface. |
| Start Year | 2023 |
| Description | 16th Annual International Biocuration Conference in Padua, Italy |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Professional Practitioners |
| Results and Impact | Participtated in a pre-conference workshop (Functional impact of glycans and their curation) with the glycan community. This involved discussions with how each resource curated glycans and how we can learn from each other. Also presented a poster about ChEBI at the main conference. A conference paper was subsequently written and submitted to the Journal of Glycobiology (pending publication). |
| Year(s) Of Engagement Activity | 2023 |
| URL | https://wiki.glygen.org/Glycan_Function_Workshop_2023 |
| Description | 19th Annual conference of the Metabolomics Society (Metabolomics 2023 in Niagara Falls, Canada) |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Professional Practitioners |
| Results and Impact | Presented a poster about ChEBI and involved with MetaboLights workshop (panel discussion). |
| Year(s) Of Engagement Activity | 2023 |
| URL | https://www.metabolomics2023.org/ |
| Description | 2nd Ontologies4Chem Workshop |
| Form Of Engagement Activity | A formal working group, expert panel or dialogue |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Professional Practitioners |
| Results and Impact | Invited by NFDI4Chem in 2023, to give a talk about the ChEBI database. The audience was a combination of university researchers, chemists, ontologists etc. |
| Year(s) Of Engagement Activity | 2023 |
| URL | https://www.nfdi4chem.de/event/2nd-ontologies4chem-workshop/ |
| Description | 3rd Ontologies4Chem workshop (Online) |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Professional Practitioners |
| Results and Impact | Presented an update on efforts to migrate ChEBI to a more robust infrastructure. |
| Year(s) Of Engagement Activity | 2024 |
| URL | https://www.nfdi4chem.de/3rd-ontologies4chem-workshop-2024/ |
| Description | ChEBI Workshop |
| Form Of Engagement Activity | Participation in an activity, workshop or similar |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Professional Practitioners |
| Results and Impact | Organised a 1-day workshop at EMBL-EBI (hybrid event, with 24 in-person and 12 remote participants). The primary aim of the ChEBI workshop was to bring together stakeholders from major bioinformatics resources that rely on ChEBI, provide updates on ChEBI's redevelopment, receive feedback, and solicit further input. |
| Year(s) Of Engagement Activity | 2024 |
| URL | https://drive.google.com/drive/folders/1XMlzZFXXm7styUhzOBgpWchiwLutQj4f |
| Description | ChEMBL & SureChEMBL anniversary symposium (EMBL-EBI) |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Professional Practitioners |
| Results and Impact | Presented a poster highlighting ChEBI's redevelopment |
| Year(s) Of Engagement Activity | 2024 |
| URL | https://www.eventsforce.net/embl/frontend/reg/thome.csp?pageID=96136&eventID=151&traceRedir=2 |
| Description | Chemistry Day (EMBL-EBI) |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | Local |
| Primary Audience | Professional Practitioners |
| Results and Impact | Presented a talk about ChEBI and future plans at the Chemistry Day. |
| Year(s) Of Engagement Activity | 2023 |
| URL | https://docs.google.com/document/d/1xbY9Qj8rndiIdfkKIb0Mk1FByGZILX8rur-bo_yTCVc/edit?tab=t.0#heading... |
| Description | GCBR of the week social media campaign |
| Form Of Engagement Activity | Engagement focused website, blog or social media channel |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Media (as a channel to the public) |
| Results and Impact | ChEBI was featured in the GCBR of the week social media campaign, giving the general public/scientific community the chance to learn more about ChEBI. ChEBI's profile and animation was published on LinkedIn and X. |
| Year(s) Of Engagement Activity | 2024 |
| URL | https://www.linkedin.com/pulse/chebi-global-biodata-coalition-jv3ae |
| Description | IUPAC WorldFAIR Chemistry Webinar (Online) |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Professional Practitioners |
| Results and Impact | Presented a flash talk about ChEBI to the general chemistry community and participated in a panel discussion. The audience was a combination of researchers, publishers, data community, etc. |
| Year(s) Of Engagement Activity | 2023 |
| URL | https://zenodo.org/records/7683072 |
| Description | MetFAIR working group (Online) |
| Form Of Engagement Activity | A formal working group, expert panel or dialogue |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Professional Practitioners |
| Results and Impact | Presented a talk about ChEBI at the MetFAIR - Reprodicible Reporting and Metabolite Annotation Task Group. The task group plans to write a white paper around metabolite annotation. |
| Year(s) Of Engagement Activity | 2025 |
| URL | https://metabolomicssociety.org/board-committees/scientific-task-groups/ |
| Description | OBO Foundry Newsletter Issue 7 |
| Form Of Engagement Activity | A magazine, newsletter or online publication |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Professional Practitioners |
| Results and Impact | Wrote a summary of the outcomes from the ChEBI workshop to inform/update the OBO (Open Biological and Biomedical Ontology) community. |
| Year(s) Of Engagement Activity | 2025 |
| URL | http://obofoundry.org/newsletter/2025/01/23/7th-issue-newsletter.html |
| Description | Ontologies4Chem workshop (Online) |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Professional Practitioners |
| Results and Impact | Invited by NFDI4Chem to give a talk about the ChEBI database. The audience was a combination of university researchers, chemists, ontologists etc. |
| Year(s) Of Engagement Activity | 2022 |
| URL | https://www.nfdi4chem.de/event/ontologies4chem-workshop/ |
| Description | Studying metabolites and small molecules with MetaboLights and ChEBI (EMBL-EBI online webinar) |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Professional Practitioners |
| Results and Impact | Presented a talk about ChEBI. The webinar was part of the molecular building blocks of life series open to anyone who was interested in studying small molecules. |
| Year(s) Of Engagement Activity | 2023 |
| URL | https://www.ebi.ac.uk/training/events/studying-metabolites-and-small-molecules-metabolights-and-cheb... |
| Description | Transformation workflow using dbt for ChEBI Database (EMBL's Scientific Workflow Club) |
| Form Of Engagement Activity | A formal working group, expert panel or dialogue |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Professional Practitioners |
| Results and Impact | Presented a talk about the new database managememnt strategy for the new ChEBI and the great results we are getting using it. |
| Year(s) Of Engagement Activity | 2024 |
| URL | https://oc.embl.de/index.php/s/LcMIsohL6XOgApR#/files_mediaviewer/2024-07-04-Transformation-workflow... |
