ExODA: Integrating Description Logics and Database Technologies for Expressive Ontology-Based Data Access
Lead Research Organisation:
University of Oxford
Department Name: Computer Science
Abstract
Sources of semi-structured, overlapping, and semantically-related data on the Web are currently proliferating at a phenomenal rate, which has created a demand for more powerful and flexible information systems (ISs). This new generation of ISs will need to integrate incomplete and semi-structured information from heterogeneous sources, employ rich and flexible schemas, and answer queries by taking into account both knowledge and data.Ontology-based data access has recently been proposed as an architectural principle for such systems. The main idea is to develop a unified view of the data by describing the relevant domain in an ontology, which then provides the vocabulary used to ask queries. The IS can use ontological statements, such as the concept hierarchy, to derive new facts and thus enrich query answers with implicit knowledge. This idea has been incorporated into systems such as QuOnto, Owlgres, ROWLKit, and REQUIEM, and ontology reasoners such as RACER, FaCT++, Pellet, and HermiT.Such systems suffer from two main problems. First, the modelling capabilities of ontology languages are often insufficient for practical use cases. In order to achieve favourable computational properties, ontology languages are usually capable of describing only tree-shaped relationships; furthermore, (with some notable exceptions) they usually support only unary and binary predicates. Finally, ontology languages typically employ the open world assumption; however, when answering queries over large amounts of data, the closed world assumption (CWA) is often more appropriate.Second, query answering facilities in existing ontology-based ISs typically do not scale to data sets commonly encountered in practice. Up to now, approaches to addressing this problem have focused on reducing the expressivity of the ontology language even further in order to obtain formal tractability guarantees. This obviously exacerbates the first problem (restricted modelling capabilities), while not necessarily delivering robust scalability in practice.Database theory and practice can provide partial solutions to these problems. In databases, complex domains can be described using dependencies. Dependencies are used in a number of different ways: they are often used as integrity constraints--checks that verify whether a database instance includes all data specified in the domain description; however, dependencies can also be used similarly to ontologies to derive implicit knowledge. Treating dependencies as integrity constraints and answering queries under CWA has allowed practical relational database management systems (RDBMSs) to scale to very large data sets.Database techniques alone do not, however, satisfy all the requirements for an ontology-based IS. In particular, dependencies often cannot model arbitrarily large structures and thus do not cover all practical modelling use cases. Furthermore, generalising the query answering techniques used in practical RDBMSs to the case where information deriving dependencies must be taken into account is still an open problem.We therefore believe that the next generation of ontology-based ISs should be based on a synthesis and an extension of ontology and database systems and techniques, providing data handling capabilities similar to current RDBMSs, but with schemas that are rich, flexible, and tightly integrated with the data. In order to achieve this ambitions goal, however, a number of challenging fundamental problems must be solved. First, ontology and dependency languages need to be unified in a coherent theoretical framework. Second, it will be necessary to identify fragments of the framework that are likely to exhibit robust scalability but can still support realistic use cases. Third, it will be necessary to devise effective algorithmic techniques that can form the basis of practical ISs.
Planned Impact
We are proposing to lay the foundations for a new generation of information systems that will revolutionise how we deal with incomplete, semi-structured, overlapping, and semantically-related data. Sources of such data are proliferating at a phenomenal rate, most notably in the context of the World Wide Web. Thus the beneficiaries could, in the long term, include anyone who uses or depends on the Web or any other information system. In the western world at least, this effectively includes every business/organisation and every individual. The marriage of techniques from DL and DB research has already been recognised as a great commercial opportunity, and DB technology vendors have started to augment their existing software with ontological reasoning. For example, Oracle Inc. has recently enhanced its well-known database management system with modules that use ontologies to support semantic data management . Their product brochure lists numerous application areas that can benefit from this technology, including, Enterprise Information Integration, Knowledge mining, Finance, Compliance Management and Life Science Research. Although Oracle sees a big market in the integration of DL and DB technologies, their current system suffers from several limitations. These include very long load times for large data sets (e.g., 90 hours for one data set), and potential incompleteness of query answers. Other systems based on the same loose integration of DL and DB technologies are known to suffer from similar problems. As a result, the applicability of such systems is currently limited to applications where the data changes relatively infrequently, and where (possibly) incomplete query answers are acceptable. We believe that the ontology-based information system developed in this project will exert a major influence on the theory and practice of ontology-based data access. Our contacts with industry (see, e.g., letters of support) will help us to ensure that our work will have an immediate impact on and benefits for companies and organisations that develop and use information systems. For example, both Oracle and Clark&Parsia are developing and selling information systems whose utility would be greatly enhanced by the algorithmic techniques that we expect to result from our work, and Alcatel-Lucent, BAE Systems, ExperienceOn, Kaiser Permanente, Lixto and Siemens are developing applications that would directly benefit from the availability of advanced information systems of the kind we will develop. In order to maximise this impact we will disseminate our results through distribution of software via the Web, presentations at relevant national and international meetings, and publications in leading conferences and journals. In addition, we will explore the possible commercialisation of our results; for example, both ExperienceOn and Clark & Parsia have expressed active an interest in exploiting the results of this project (see attached letters of support). Thus, our results will reach a broad cross section of computer science researchers and IT practitioners in both academia and industry. The proposers have an established track record of successful research and of impact on both the theory and practice of information technology: they have participated in numerous research projects, many of which have had significant impact on both research and industry, leading in several instances to exploitation of IP and/or commercial spin-offs; they have taken the lead in ensuring that ontology language standards are firmly based on foundational research; and they have been the recipients of several prestigious prizes and awards in recognition of their contributions to research.
Publications
Kupke C
(2012)
Completeness for the coalgebraic cover modality
Carral D
(2014)
Automated Reasoning
Soylu A
(2013)
OptiqueVQS
Soylu A
(2013)
Metadata and Semantics Research
Cuenca Grau B
(2012)
Acyclicity Conditions and their Application to Query Answering in Description Logics
Stefanoni G
(2013)
Introducing Nominals to the Combined Query Answering Approaches for EL
Cuenca Grau B
(2013)
Computing Datalog Rewritings beyond Horn Ontologies
Krötzsch M
(2013)
Logic Programming and Nonmonotonic Reasoning
Jimenez-Ruiz E
(2013)
Evaluating Mapping Repair Systems with Large Biomedical Ontologies
Zhou Y.
(2013)
Making the most of your triple store: Query answering in OWL 2 using an RL reasoner
in WWW 2013 - Proceedings of the 22nd International Conference on World Wide Web
Della Valle E
(2013)
Order matters! Harnessing a world of orderings for reasoning over massive data
in Semantic Web
Della Valle E
(2013)
Order matters! Harnessing a world of orderings for reasoning over massive data
in Semantic Web
Kupke C
(2012)
Completeness for the coalgebraic cover modality
in Logical Methods in Computer Science
Georg Gottlob
(2014)
Stable Model Semantics for Guarded Existential Rules and Description Logics
in KR
Horrocks I
(2012)
Semantics ? scalability ? ??
in Journal of Zhejiang University SCIENCE C
Jiménez-Ruiz E
(2011)
Logic-based assessment of the compatibility of UMLS ontology sources.
in Journal of biomedical semantics
Magka D
(2014)
A rule-based ontological framework for the classification of molecules.
in Journal of biomedical semantics
Glimm B
(2014)
HermiT: An OWL 2 Reasoner
in Journal of Automated Reasoning
Grau B.C.
(2013)
Acyclicity notions for existential rules and their application to query answering in ontologies
in Journal of Artificial Intelligence Research
Cuenca Grau B
(2012)
Completeness Guarantees for Incomplete Ontology Reasoners: Theory and Practice
in Journal of Artificial Intelligence Research
Grau Bernardo Cuenca
(2013)
Acyclicity Notions for Existential Rules and Their Application to Query Answering in Ontologies
in JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH
Grau B
(2013)
Acyclicity notions for existential rules and their application to query answering in ontologies
in Journal of Artificial Intelligence Research
Magka D.
(2013)
Computing stable models for nonmonotonic existential rules
in IJCAI International Joint Conference on Artificial Intelligence
Krötzsch M
(2014)
Description Logics
in IEEE Intelligent Systems
Jiménez-Ruiz E.
(2012)
Large-scale interactive ontology matching: Algorithms and implementation
in Frontiers in Artificial Intelligence and Applications
Jiménez-Ruiz E.
(2012)
On the feasibility of using OWL 2 DL reasoners for ontology matching problems
in CEUR Workshop Proceedings
Magka D.
(2013)
Nonmonotonic existential rules for non-tree-shaped ontological modelling
in CEUR Workshop Proceedings
Stefanoni G
(2012)
Small datalog query rewritings for EL
in CEUR Workshop Proceedings
Horrocks I.
(2012)
The HermiT OWL reasoner
in CEUR Workshop Proceedings
Jiménez-Ruiz E.
(2012)
Exploiting the UMLS metathesaurus in the ontology alignment evaluation initiative
in CEUR Workshop Proceedings
Jiménez-Ruiz E.
(2012)
LogMap and LogMapLt results for OAEI 2012
in CEUR Workshop Proceedings
Kr
(2012)
A Description Logic Primer
in arXiv e-prints
Stefanoni Giorgio
(2013)
Introducing Nominals to the Combined Query Answering Approaches for EL
in arXiv e-prints
Simancík F
(2014)
Consequence-based and fixed-parameter tractable reasoning in description logics
in Artificial Intelligence
Gottlob G
(2014)
The price of query rewriting in ontology-based data access
in Artificial Intelligence
Kaminski M
(2016)
Datalog rewritability of Disjunctive Datalog programs and non-Horn ontologies
in Artificial Intelligence
Description | Sources of semi-structured, overlapping, and semantically-related data on the Web are currently proliferating at a phenomenal rate, which has created a demand for more powerful and flexible information systems (ISs). This new generation of ISs will need to integrate incomplete and semi-structured information from heterogeneous sources, employ rich and flexible schemas, and answer queries by taking into account both knowledge and data. Ontology-based data access has recently been proposed as an architectural principle for such systems. The main idea is to develop a unified view of the data by describing the relevant domain in an ontology, which then provides the vocabulary used to ask queries. The IS can use ontological statements, such as the concept hierarchy, to derive new facts and thus enrich query answers with implicit knowledge. This idea has been incorporated into systems such as QuOnto, Owlgres, ROWLKit, and REQUIEM, and ontology reasoners such as RACER, FaCT++, Pellet, and HermiT. Such systems suffer from two main problems. First, the modelling capabilities of ontology languages are often insufficient for practical use cases. In order to achieve favourable computational properties, ontology languages are usually capable of describing only "tree-shaped" relationships; furthermore, (with some notable exceptions) they usually support only unary and binary predicates. Finally, ontology languages typically employ the open world assumption; however, when answering queries over large amounts of data, the closed world assumption (CWA) is often more appropriate. Second, query answering facilities in existing ontology-based ISs typically do not scale to data sets commonly encountered in practice. Up to now, approaches to addressing this problem have focused on reducing the expressivity of the ontology language even further in order to obtain formal tractability guarantees. This obviously exacerbates the first problem (restricted modelling capabilities), while not necessarily delivering robust scalability in practice. Database theory and practice can provide partial solutions to these problems. In databases, complex domains can be described using dependencies. Dependencies are used in a number of different ways: they are often used as integrity constraints--checks that verify whether a database instance includes all data specified in the domain description; however, dependencies can also be used similarly to ontologies to derive implicit knowledge. Treating dependencies as integrity constraints and answering queries under CWA has allowed practical relational database management systems (RDBMSs) to scale to very large data sets. Database techniques alone do not, however, satisfy all the requirements for an ontology-based IS. In particular, dependencies often cannot model arbitrarily large structures and thus do not cover all practical modelling use cases. Furthermore, generalising the query answering techniques used in practical RDBMSs to the case where information deriving dependencies must be taken into account is still an open problem. We therefore believe that the next generation of ontology-based ISs should be based on a synthesis and an extension ofontology and database systems and techniques, providing data handling capabilities similar to current RDBMSs, but with schemas that are rich, flexible, and tightly integrated with the data. In the ExODA project we have made several important steps in this direction, both theoretical and practical. Theory: We have studied the integration of description logics with tuple generating dependencies (TGDs) and ontology generating dependencies (EGDs); the resulting languages are undecidable in general, but we have developed novel acyclicity tests that allow us to identify ontologies for which reasoning procedures are guaranteed to terminate. We also tackled the problem of defining a well founded semantics (WFS) for such languages extended with negations in their bodies. In particular, we provide a WFS for the recent Datalog+/- family of ontology languages, which covers several important description logics (DLs). We have developed a classification of OWL 2 QL ontology-mediated queries according to the succinctness (exponential vs polynomial) of first-order and nonrecursive datalog query rewritings. We have developed a tree-witness query rewriting for both conjunctive queries and OWL 2 QL ontologies, and SPARQL 1.1 queries under the OWL 2 QL Entailment Regime. We have also developed an alternative approach to ontology-based data access with OWL 2 QL, called the combined approach, which uses both query rewriting and data materialisation. We have carried out fundamental studies into causes of computational hardness, and developed a new fixed-parameter tractability framework that allows for a much finer grained analysis of computational complexity. Practice: We have implemented the tree-witness query rewriting together with a variety of optimisations in the ontology-based data access platform Ontop (http://ontop.inf.unibz.it). The combined approach has been implemented by our partners at Bremen in the system Combo (https://code.google.com/p/combo-obda/). We have implemented a prototype reasoner that performs automatic classification of chemical compounds from the ChEBI ontology using a formalism that combines features from both description logic and rule based ontology languages, and that allows for the representation of a wide range of chemical classes that are not expressible with description logic based formalisms such as OWL. Impact: The ontology-based data access platform Ontop is an open access system that is available at http://ontop.inf.unibz.it. Ontop is widely used in practice; in the EU FP7 project Optique (http://www.optique-project.eu), for example, it is being used for querying oil and gas data in the Norwegian oil company Statoil (http://sws.ifi.uio.no/project/npd-v2/). We have published our results in leading international conferences and journals, including AIJ, JAIR, JAR, IEEE-IS, AAAI, KR, ISWC, LICS and PODS. |
Exploitation Route | The ontology-based data access platform Ontop is an open access system that is available at http://ontop.inf.unibz.it. Ontop is widely used in practice; in the EU FP7 project Optique (http://www.optique-project.eu), for example, it is being used for querying oil and gas data in the Norwegian oil company Statoil (http://sws.ifi.uio.no/project/npd-v2/). We have published our results in leading international conferences and journals, including AIJ, JAIR, JAR, IEEE-IS, AAAI, KR, ISWC, LICS and PODS. |
Sectors | Digital/Communication/Information Technologies (including Software) Energy Healthcare |
URL | http://www.cs.ox.ac.uk/isg/projects/ExODA/ |
Description | Our results heavily influenced the ontology-based data access platform Ontop, an open access system that is available at http://ontop.inf.unibz.it. Ontop is widely used in practice; in the EU FP7 project Optique (http://www.optique-project.eu), for example, it is being used for querying oil and gas data in the Norwegian oil company Statoil (http://sws.ifi.uio.no/project/npd-v2/). We have published our results in leading international conferences and journals, including AIJ, JAIR, JAR, IEEE-IS, AAAI, KR, ISWC, LICS and PODS. |
First Year Of Impact | 2013 |
Sector | Digital/Communication/Information Technologies (including Software),Energy |
Impact Types | Economic |
Description | EPSRC DBOnto |
Amount | £1,263,746 (GBP) |
Funding ID | EP/L012138/1 |
Organisation | Engineering and Physical Sciences Research Council (EPSRC) |
Sector | Public |
Country | United Kingdom |
Start | 01/2014 |
End | 01/2019 |
Description | FP7 Optique |
Amount | € 1,136,604 (EUR) |
Funding ID | 318338 |
Organisation | European Commission |
Department | Seventh Framework Programme (FP7) |
Sector | Public |
Country | European Union (EU) |
Start | 11/2012 |
End | 11/2016 |
Description | VADA |
Amount | £4,557,635 (GBP) |
Funding ID | EP/M025268/1 |
Organisation | Engineering and Physical Sciences Research Council (EPSRC) |
Sector | Public |
Country | United Kingdom |
Start | 03/2015 |
End | 03/2020 |
Description | EDF Exoda |
Organisation | EDF Energy |
Department | EDF Innovation and Research |
Country | France |
Sector | Private |
PI Contribution | Helped to develop EDF energy management advisor using HermiT reasoner. |
Collaborator Contribution | Developed EDF energy management advisor using HermiT reasoner and evaluated it via deployment at EDF. |
Impact | EDF energy management advisor used to generate bespoke energy saving advice for EDF customers. |
Start Year | 2011 |