HermiT: Reasoning with Large Ontologies

Lead Research Organisation: University of Oxford
Department Name: Computer Science

Abstract

Ontologies are formal vocabularies of terms, often shared by a community of users. One of the most prominent application areas of ontologies is medicine and the life sciences. For example, the Systematised Nomenclature of Medicine Clinical Terms (SNOMED CT) is a clinical ontology which is being used in the UK Health Service's National Programme for Information Technology (NPfIT). Other examples include GALEN, the Foundational Model of Anatomy (FMA), the National Cancer Institute (NCI) Thesaurus, and the OBO Foundry -- a repository containing about 80 biomedical ontologies.These ontologies are gradually superseding existing medical classifications and will provide the future platforms for gathering and sharing medical knowledge. Capturing medical records using ontologies will reduce the possibility for data misinterpretation, and will enable information exchange between different applications and institutions. Medical ontologies are strongly related to description logics (DLs), which provide the formal basis for many ontology languages, most notably the W3C standardised Web Ontology Language (OWL). All the above mentioned ontologies are nowadays available in OWL and, therefore, in a description logic. The developers of medical ontologies have recognised the numerous benefits of using DLs, such as the clear and unambiguous semantics for different modelling constructs, the well-understood tradeoffs between expressivity and computational complexity, and the availability of provably correct reasoners and tools.The development and application of ontologies crucially depend on reasoning. Ontology classification, i.e., organising classes into a specialisation/generalisation hierarchy, is a reasoning task that plays a major role during ontology development: it provides for the detection of potential modelling errors such as inconsistent class descriptions and missing sub-class relationships. For example, about 180 missing sub-class relationships were detected when the version of SNOMED CT used by the NHS was classified using the DL reasoner FaCT++. Query answering is another reasoning task that is mainly used during ontology-based information retrieval; e.g., in clinical applications query answering might be used to retrieve all patients that suffer from nut allergies . Despite the impressive state-of-the-art, modern medical ontologies pose significant challenges to both the theory and practice of DL-based languages. Existing reasoners can efficiently deal with some large ontologies, such as NCI, but many important ontologies are still beyond the reach of available tools. For example, none of the existing reasoners can successfully classify either GALEN or FMA. Applications currently need to work around these limitations, e.g., by using subsets of ontologies that can be successfully processed. For example, the version of GALEN typically used in practice contains only about 20% of the axioms of the full version; this reduces the interaction between concepts and thus makes the ontology processable . This is, however, highly undesirable in practice, because it reduces coverage, weakens the conceptualisation of the domain and may prevent the detection of modelling errors.Furthermore, the amount of data used with ontologies can be orders of magnitude larger than the ontology itself. For example, the annotation of patients' medical records in a single hospital can easily produce data consisting of hundreds of millions of facts, and aggregation at a national level might produce billions of facts. Existing reasoners cannot cope with such data volumes, especially not if ontologies such as GALEN and FMA are used as schemata.The goal of this project is to develop scalable reasoning algorithms and a prototypical implementation that can efficiently deal with large and complex ontologies and large data sets. Developing such a reasoner will be critical to the success of many ontology based applications.

Publications

10 25 50
publication icon
Motik B (2009) Hypertableau Reasoning for Description Logics in Journal of Artificial Intelligence Research

publication icon
Glimm B (2014) HermiT: An OWL 2 Reasoner in Journal of Automated Reasoning

publication icon
Glimm B (2012) A novel approach to ontology classification in Journal of Web Semantics

publication icon
Bate Andrew (2018) Consequence-Based Reasoning for Description Logics with Disjunctions and Number Restrictions in JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH

 
Description Ontologies are formal vocabularies of terms, often shared by a community of users. One of the most prominent application areas of ontologies is medicine and the life sciences. For example, the Systematised Nomenclature of Medicine Clinical Terms (SNOMED CT) is a clinical ontology which is being used in the UK Health Service's National Programme for Information Technology (NPfIT). Other examples include GALEN, the Foundational Model of Anatomy (FMA), the National Cancer Institute (NCI) Thesaurus, and the OBO Foundry -- a repository containing about 80 biomedical ontologies.

These ontologies are gradually superseding existing medical classifications and will provide the future platforms for gathering and sharing medical knowledge. Capturing medical records using ontologies will reduce the possibility for data misinterpretation, and will enable information exchange between different applications and institutions.

Medical ontologies are strongly related to description logics (DLs), which provide the formal basis for many ontology languages, most notably the W3C standardised Web Ontology Language (OWL). All the above mentioned ontologies are nowadays available in OWL and, therefore, in a description logic. The developers of medical ontologies have recognised the numerous benefits of using DLs, such as the clear and unambiguous semantics for different modelling constructs, the well-understood tradeoffs between expressivity and computational complexity, and the availability of provably correct reasoners and tools.

The development and application of ontologies crucially depend on reasoning. Ontology classification, i.e., organising classes into a specialisation/generalisation hierarchy, is a reasoning task that plays a major role during ontology development: it provides for the detection of potential modelling errors such as inconsistent class descriptions and missing sub-class relationships. For example, about 180 missing sub-class relationships were detected when the version of SNOMED CT used by the NHS was classified using the DL reasoner FaCT++. Query answering is another reasoning task that is mainly used during ontology-based information retrieval; e.g., in clinical applications query answering might be used to retrieve "all patients that suffer from nut allergies".

Modern medical ontologies pose significant challenges to both the theory and practice of DL-based languages. The increasing influence of OWL and the impressive performance of stat-of-the-art reasoners tends to exacerbate this problem. Moreover, the amount of data used with ontologies can be orders of magnitude larger than the ontology itself. For example, the annotation of patients' medical records in a single hospital can easily produce data consisting of hundreds of millions of facts, and aggregation at a national level might produce billions of facts. Existing reasoners cannot cope with such data volumes, especially not if ontologies such as GALEN and FMA are used as schemata.

The HermiT reasoner, which has been developed in this project, addresses all these issues. It is based on a new hypertableau calculus that avoids many of the problems exhibited by existing reasoners, including excessive nondeterminism and model size. Moreover, the implementation of the calculus includes a range of novel optimisations. Finally, HermiT uses a completely new and much more efficient classification algorithm developed in the project. HermiT is the only reasoner that fully supports the OWL 2 standard, and it is more robustly scalable than OWL reasoners. As well as supporting OWL 2, HermiT also supports the W3C SPARQL 1.1 query language, and includes a range of optimisations that significantly improve the performance of query answering.
Exploitation Route HermiT is now one of the most widely used OWL reasoners, and its popularity has extended far beyond the life-sciences domain. For example, HermiT is used to power a system that provides energy saving advice to EDF Energy customers.
Sectors Aerospace, Defence and Marine,Digital/Communication/Information Technologies (including Software),Energy,Healthcare,Pharmaceuticals and Medical Biotechnology

URL http://www.cs.ox.ac.uk/isg/projects/HermiT/
 
Description The HermiT reasoner, which has been developed in this project, is the only reasoner that fully supports the OWL 2 standard, and it is more robustly scalable than other OWL reasoners. As well as supporting OWL 2, HermiT also supports the W3C SPARQL 1.1 query language, and includes a range of optimisations that significantly improve the performance of query answering. As a result, HermiT is now one of the most widely used OWL reasoners, and its popularity has extended far beyond the life-sciences domain. For example, HermiT is used to power a system that provides energy saving advice to EDF Energy customers.
First Year Of Impact 2011
Sector Aerospace, Defence and Marine,Digital/Communication/Information Technologies (including Software),Energy,Healthcare,Pharmaceuticals and Medical Biotechnology
Impact Types Economic

 
Description EPSRC ExODA 2
Amount £704,521 (GBP)
Funding ID EP/H051511/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 04/2011 
End 09/2014
 
Title HermiT 
Description A reasoning system for OWL ontologies 
Type Of Technology Software 
Year Produced 2008 
Open Source License? Yes  
Impact HermiT is the most widely used OWL reasoner, and the only one to fully support the OWL 2 standard. It is used in both research and industry, for example in EDF's energy management adviser, which is used by hundreds of thousands of EDF customers in France. 
URL http://hermit-reasoner.com/