ExODA: Integrating Description Logics and Database Technologies for Expressive Ontology-Based Data Access

Lead Research Organisation: University of Oxford

Department Name: Computer Science

Abstract

Sources of semi-structured, overlapping, and semantically-related data on the Web are currently proliferating at a phenomenal rate, which has created a demand for more powerful and flexible information systems (ISs). This new generation of ISs will need to integrate incomplete and semi-structured information from heterogeneous sources, employ rich and flexible schemas, and answer queries by taking into account both knowledge and data.Ontology-based data access has recently been proposed as an architectural principle for such systems. The main idea is to develop a unified view of the data by describing the relevant domain in an ontology, which then provides the vocabulary used to ask queries. The IS can use ontological statements, such as the concept hierarchy, to derive new facts and thus enrich query answers with implicit knowledge. This idea has been incorporated into systems such as QuOnto, Owlgres, ROWLKit, and REQUIEM, and ontology reasoners such as RACER, FaCT++, Pellet, and HermiT.Such systems suffer from two main problems. First, the modelling capabilities of ontology languages are often insufficient for practical use cases. In order to achieve favourable computational properties, ontology languages are usually capable of describing only tree-shaped relationships; furthermore, (with some notable exceptions) they usually support only unary and binary predicates. Finally, ontology languages typically employ the open world assumption; however, when answering queries over large amounts of data, the closed world assumption (CWA) is often more appropriate.Second, query answering facilities in existing ontology-based ISs typically do not scale to data sets commonly encountered in practice. Up to now, approaches to addressing this problem have focused on reducing the expressivity of the ontology language even further in order to obtain formal tractability guarantees. This obviously exacerbates the first problem (restricted modelling capabilities), while not necessarily delivering robust scalability in practice.Database theory and practice can provide partial solutions to these problems. In databases, complex domains can be described using dependencies. Dependencies are used in a number of different ways: they are often used as integrity constraints--checks that verify whether a database instance includes all data specified in the domain description; however, dependencies can also be used similarly to ontologies to derive implicit knowledge. Treating dependencies as integrity constraints and answering queries under CWA has allowed practical relational database management systems (RDBMSs) to scale to very large data sets.Database techniques alone do not, however, satisfy all the requirements for an ontology-based IS. In particular, dependencies often cannot model arbitrarily large structures and thus do not cover all practical modelling use cases. Furthermore, generalising the query answering techniques used in practical RDBMSs to the case where information deriving dependencies must be taken into account is still an open problem.We therefore believe that the next generation of ontology-based ISs should be based on a synthesis and an extension of ontology and database systems and techniques, providing data handling capabilities similar to current RDBMSs, but with schemas that are rich, flexible, and tightly integrated with the data. In order to achieve this ambitions goal, however, a number of challenging fundamental problems must be solved. First, ontology and dependency languages need to be unified in a coherent theoretical framework. Second, it will be necessary to identify fragments of the framework that are likely to exhibit robust scalability but can still support realistic use cases. Third, it will be necessary to devise effective algorithmic techniques that can form the basis of practical ISs.

Planned Impact

We are proposing to lay the foundations for a new generation of information systems that will revolutionise how we deal with incomplete, semi-structured, overlapping, and semantically-related data. Sources of such data are proliferating at a phenomenal rate, most notably in the context of the World Wide Web. Thus the beneficiaries could, in the long term, include anyone who uses or depends on the Web or any other information system. In the western world at least, this effectively includes every business/organisation and every individual. The marriage of techniques from DL and DB research has already been recognised as a great commercial opportunity, and DB technology vendors have started to augment their existing software with ontological reasoning. For example, Oracle Inc. has recently enhanced its well-known database management system with modules that use ontologies to support semantic data management . Their product brochure lists numerous application areas that can benefit from this technology, including, Enterprise Information Integration, Knowledge mining, Finance, Compliance Management and Life Science Research. Although Oracle sees a big market in the integration of DL and DB technologies, their current system suffers from several limitations. These include very long load times for large data sets (e.g., 90 hours for one data set), and potential incompleteness of query answers. Other systems based on the same loose integration of DL and DB technologies are known to suffer from similar problems. As a result, the applicability of such systems is currently limited to applications where the data changes relatively infrequently, and where (possibly) incomplete query answers are acceptable. We believe that the ontology-based information system developed in this project will exert a major influence on the theory and practice of ontology-based data access. Our contacts with industry (see, e.g., letters of support) will help us to ensure that our work will have an immediate impact on and benefits for companies and organisations that develop and use information systems. For example, both Oracle and Clark&Parsia are developing and selling information systems whose utility would be greatly enhanced by the algorithmic techniques that we expect to result from our work, and Alcatel-Lucent, BAE Systems, ExperienceOn, Kaiser Permanente, Lixto and Siemens are developing applications that would directly benefit from the availability of advanced information systems of the kind we will develop. In order to maximise this impact we will disseminate our results through distribution of software via the Web, presentations at relevant national and international meetings, and publications in leading conferences and journals. In addition, we will explore the possible commercialisation of our results; for example, both ExperienceOn and Clark & Parsia have expressed active an interest in exploiting the results of this project (see attached letters of support). Thus, our results will reach a broad cross section of computer science researchers and IT practitioners in both academia and industry. The proposers have an established track record of successful research and of impact on both the theory and practice of information technology: they have participated in numerous research projects, many of which have had significant impact on both research and industry, leading in several instances to exploitation of IP and/or commercial spin-offs; they have taken the lead in ensuring that ontology language standards are firmly based on foundational research; and they have been the recipients of several prestigious prizes and awards in recognition of their contributions to research.

Funded Value:

£704,520

Funded Period:

Apr 11 - Sep 14

Funder:

EPSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

EP/H051511/1

Principal Investigator:

Ian Horrocks

Research Subject:

Info. & commun. Technol. (100%)

Research Topic:

Information & Knowledge Mgmt (100%)

Organisations

People	ORCID iD
Ian Horrocks (Principal Investigator)
Georg Gottlob (Co-Investigator)
Boris Motik (Co-Investigator)	http://orcid.org/0000-0003-2506-4118
Thomas Lukasiewicz (Co-Investigator)
Michael Benedikt (Co-Investigator)

Publications

Author Name Title Publication

Date Published

|< < 1 2 3 4 5 > >|

10 25 50

Jiménez-Ruiz E. (2012) Exploiting the UMLS metathesaurus in the ontology alignment evaluation initiative in CEUR Workshop Proceedings

Jiménez-Ruiz E. (2012) Large-scale interactive ontology matching: Algorithms and implementation in Frontiers in Artificial Intelligence and Applications

Krötzsch M (2014) Description Logics in IEEE Intelligent Systems

Magka D. (2013) Computing stable models for nonmonotonic existential rules in IJCAI International Joint Conference on Artificial Intelligence

Grau B.C. (2013) Acyclicity notions for existential rules and their application to query answering in ontologies in Journal of Artificial Intelligence Research

Cuenca Grau B (2012) Completeness Guarantees for Incomplete Ontology Reasoners: Theory and Practice in Journal of Artificial Intelligence Research

Grau B (2013) Acyclicity notions for existential rules and their application to query answering in ontologies in Journal of Artificial Intelligence Research

Glimm B (2014) HermiT: An OWL 2 Reasoner in Journal of Automated Reasoning

Jiménez-Ruiz E (2011) Logic-based assessment of the compatibility of UMLS ontology sources. in Journal of biomedical semantics

Magka D (2014) A rule-based ontological framework for the classification of molecules. in Journal of biomedical semantics

Key Findings
Impact Summary
Further Funding
Collaboration


Description	Sources of semi-structured, overlapping, and semantically-related data on the Web are currently proliferating at a phenomenal rate, which has created a demand for more powerful and flexible information systems (ISs). This new generation of ISs will need to integrate incomplete and semi-structured information from heterogeneous sources, employ rich and flexible schemas, and answer queries by taking into account both knowledge and data. Ontology-based data access has recently been proposed as an architectural principle for such systems. The main idea is to develop a unified view of the data by describing the relevant domain in an ontology, which then provides the vocabulary used to ask queries. The IS can use ontological statements, such as the concept hierarchy, to derive new facts and thus enrich query answers with implicit knowledge. This idea has been incorporated into systems such as QuOnto, Owlgres, ROWLKit, and REQUIEM, and ontology reasoners such as RACER, FaCT++, Pellet, and HermiT. Such systems suffer from two main problems. First, the modelling capabilities of ontology languages are often insufficient for practical use cases. In order to achieve favourable computational properties, ontology languages are usually capable of describing only "tree-shaped" relationships; furthermore, (with some notable exceptions) they usually support only unary and binary predicates. Finally, ontology languages typically employ the open world assumption; however, when answering queries over large amounts of data, the closed world assumption (CWA) is often more appropriate. Second, query answering facilities in existing ontology-based ISs typically do not scale to data sets commonly encountered in practice. Up to now, approaches to addressing this problem have focused on reducing the expressivity of the ontology language even further in order to obtain formal tractability guarantees. This obviously exacerbates the first problem (restricted modelling capabilities), while not necessarily delivering robust scalability in practice. Database theory and practice can provide partial solutions to these problems. In databases, complex domains can be described using dependencies. Dependencies are used in a number of different ways: they are often used as integrity constraints--checks that verify whether a database instance includes all data specified in the domain description; however, dependencies can also be used similarly to ontologies to derive implicit knowledge. Treating dependencies as integrity constraints and answering queries under CWA has allowed practical relational database management systems (RDBMSs) to scale to very large data sets. Database techniques alone do not, however, satisfy all the requirements for an ontology-based IS. In particular, dependencies often cannot model arbitrarily large structures and thus do not cover all practical modelling use cases. Furthermore, generalising the query answering techniques used in practical RDBMSs to the case where information deriving dependencies must be taken into account is still an open problem. We therefore believe that the next generation of ontology-based ISs should be based on a synthesis and an extension ofontology and database systems and techniques, providing data handling capabilities similar to current RDBMSs, but with schemas that are rich, flexible, and tightly integrated with the data. In the ExODA project we have made several important steps in this direction, both theoretical and practical. Theory: We have studied the integration of description logics with tuple generating dependencies (TGDs) and ontology generating dependencies (EGDs); the resulting languages are undecidable in general, but we have developed novel acyclicity tests that allow us to identify ontologies for which reasoning procedures are guaranteed to terminate. We also tackled the problem of defining a well founded semantics (WFS) for such languages extended with negations in their bodies. In particular, we provide a WFS for the recent Datalog+/- family of ontology languages, which covers several important description logics (DLs). We have developed a classification of OWL 2 QL ontology-mediated queries according to the succinctness (exponential vs polynomial) of first-order and nonrecursive datalog query rewritings. We have developed a tree-witness query rewriting for both conjunctive queries and OWL 2 QL ontologies, and SPARQL 1.1 queries under the OWL 2 QL Entailment Regime. We have also developed an alternative approach to ontology-based data access with OWL 2 QL, called the combined approach, which uses both query rewriting and data materialisation. We have carried out fundamental studies into causes of computational hardness, and developed a new fixed-parameter tractability framework that allows for a much finer grained analysis of computational complexity. Practice: We have implemented the tree-witness query rewriting together with a variety of optimisations in the ontology-based data access platform Ontop (http://ontop.inf.unibz.it). The combined approach has been implemented by our partners at Bremen in the system Combo (https://code.google.com/p/combo-obda/). We have implemented a prototype reasoner that performs automatic classification of chemical compounds from the ChEBI ontology using a formalism that combines features from both description logic and rule based ontology languages, and that allows for the representation of a wide range of chemical classes that are not expressible with description logic based formalisms such as OWL. Impact: The ontology-based data access platform Ontop is an open access system that is available at http://ontop.inf.unibz.it. Ontop is widely used in practice; in the EU FP7 project Optique (http://www.optique-project.eu), for example, it is being used for querying oil and gas data in the Norwegian oil company Statoil (http://sws.ifi.uio.no/project/npd-v2/). We have published our results in leading international conferences and journals, including AIJ, JAIR, JAR, IEEE-IS, AAAI, KR, ISWC, LICS and PODS.
Exploitation Route	The ontology-based data access platform Ontop is an open access system that is available at http://ontop.inf.unibz.it. Ontop is widely used in practice; in the EU FP7 project Optique (http://www.optique-project.eu), for example, it is being used for querying oil and gas data in the Norwegian oil company Statoil (http://sws.ifi.uio.no/project/npd-v2/). We have published our results in leading international conferences and journals, including AIJ, JAIR, JAR, IEEE-IS, AAAI, KR, ISWC, LICS and PODS.
Sectors	Digital/Communication/Information Technologies (including Software),Energy,Healthcare
URL	http://www.cs.ox.ac.uk/isg/projects/ExODA/


Description	Our results heavily influenced the ontology-based data access platform Ontop, an open access system that is available at http://ontop.inf.unibz.it. Ontop is widely used in practice; in the EU FP7 project Optique (http://www.optique-project.eu), for example, it is being used for querying oil and gas data in the Norwegian oil company Statoil (http://sws.ifi.uio.no/project/npd-v2/). We have published our results in leading international conferences and journals, including AIJ, JAIR, JAR, IEEE-IS, AAAI, KR, ISWC, LICS and PODS.
First Year Of Impact	2013
Sector	Digital/Communication/Information Technologies (including Software),Energy
Impact Types	Economic


Description	EPSRC DBOnto
Amount	£1,263,746 (GBP)
Funding ID	EP/L012138/1
Organisation	Engineering and Physical Sciences Research Council (EPSRC)
Sector	Public
Country	United Kingdom
Start	01/2014
End	01/2019


Description	FP7 Optique
Amount	€ 1,136,604 (EUR)
Funding ID	318338
Organisation	European Commission
Department	Seventh Framework Programme (FP7)
Sector	Public
Country	European Union (EU)
Start	11/2012
End	11/2016


Description	VADA
Amount	£4,557,635 (GBP)
Funding ID	EP/M025268/1
Organisation	Engineering and Physical Sciences Research Council (EPSRC)
Sector	Public
Country	United Kingdom
Start	04/2015
End	03/2020


Description	EDF Exoda
Organisation	EDF Energy
Department	EDF Innovation and Research
Country	France
Sector	Private
PI Contribution	Helped to develop EDF energy management advisor using HermiT reasoner.
Collaborator Contribution	Developed EDF energy management advisor using HermiT reasoner and evaluated it via deployment at EDF.
Impact	EDF energy management advisor used to generate bespoke energy saving advice for EDF customers.
Start Year	2011

Abstract

Planned Impact

Organisations

People

ORCID iD

Publications