ExODA: Integrating Description Logics and Database Technologies for Expressive Ontology-Based Data Access

Lead Research Organisation: University of Oxford
Department Name: Computer Science

Abstract

Sources of semi-structured, overlapping, and semantically-related data on the Web are currently proliferating at a phenomenal rate, which has created a demand for more powerful and flexible information systems (ISs). This new generation of ISs will need to integrate incomplete and semi-structured information from heterogeneous sources, employ rich and flexible schemas, and answer queries by taking into account both knowledge and data.Ontology-based data access has recently been proposed as an architectural principle for such systems. The main idea is to develop a unified view of the data by describing the relevant domain in an ontology, which then provides the vocabulary used to ask queries. The IS can use ontological statements, such as the concept hierarchy, to derive new facts and thus enrich query answers with implicit knowledge. This idea has been incorporated into systems such as QuOnto, Owlgres, ROWLKit, and REQUIEM, and ontology reasoners such as RACER, FaCT++, Pellet, and HermiT.Such systems suffer from two main problems. First, the modelling capabilities of ontology languages are often insufficient for practical use cases. In order to achieve favourable computational properties, ontology languages are usually capable of describing only tree-shaped relationships; furthermore, (with some notable exceptions) they usually support only unary and binary predicates. Finally, ontology languages typically employ the open world assumption; however, when answering queries over large amounts of data, the closed world assumption (CWA) is often more appropriate.Second, query answering facilities in existing ontology-based ISs typically do not scale to data sets commonly encountered in practice. Up to now, approaches to addressing this problem have focused on reducing the expressivity of the ontology language even further in order to obtain formal tractability guarantees. This obviously exacerbates the first problem (restricted modelling capabilities), while not necessarily delivering robust scalability in practice.Database theory and practice can provide partial solutions to these problems. In databases, complex domains can be described using dependencies. Dependencies are used in a number of different ways: they are often used as integrity constraints--checks that verify whether a database instance includes all data specified in the domain description; however, dependencies can also be used similarly to ontologies to derive implicit knowledge. Treating dependencies as integrity constraints and answering queries under CWA has allowed practical relational database management systems (RDBMSs) to scale to very large data sets.Database techniques alone do not, however, satisfy all the requirements for an ontology-based IS. In particular, dependencies often cannot model arbitrarily large structures and thus do not cover all practical modelling use cases. Furthermore, generalising the query answering techniques used in practical RDBMSs to the case where information deriving dependencies must be taken into account is still an open problem.We therefore believe that the next generation of ontology-based ISs should be based on a synthesis and an extension of ontology and database systems and techniques, providing data handling capabilities similar to current RDBMSs, but with schemas that are rich, flexible, and tightly integrated with the data. In order to achieve this ambitions goal, however, a number of challenging fundamental problems must be solved. First, ontology and dependency languages need to be unified in a coherent theoretical framework. Second, it will be necessary to identify fragments of the framework that are likely to exhibit robust scalability but can still support realistic use cases. Third, it will be necessary to devise effective algorithmic techniques that can form the basis of practical ISs.

Planned Impact

We are proposing to lay the foundations for a new generation of information systems that will revolutionise how we deal with incomplete, semi-structured, overlapping, and semantically-related data. Sources of such data are proliferating at a phenomenal rate, most notably in the context of the World Wide Web. Thus the beneficiaries could, in the long term, include anyone who uses or depends on the Web or any other information system. In the western world at least, this effectively includes every business/organisation and every individual. The marriage of techniques from DL and DB research has already been recognised as a great commercial opportunity, and DB technology vendors have started to augment their existing software with ontological reasoning. For example, Oracle Inc. has recently enhanced its well-known database management system with modules that use ontologies to support semantic data management . Their product brochure lists numerous application areas that can benefit from this technology, including, Enterprise Information Integration, Knowledge mining, Finance, Compliance Management and Life Science Research. Although Oracle sees a big market in the integration of DL and DB technologies, their current system suffers from several limitations. These include very long load times for large data sets (e.g., 90 hours for one data set), and potential incompleteness of query answers. Other systems based on the same loose integration of DL and DB technologies are known to suffer from similar problems. As a result, the applicability of such systems is currently limited to applications where the data changes relatively infrequently, and where (possibly) incomplete query answers are acceptable. We believe that the ontology-based information system developed in this project will exert a major influence on the theory and practice of ontology-based data access. Our contacts with industry (see, e.g., letters of support) will help us to ensure that our work will have an immediate impact on and benefits for companies and organisations that develop and use information systems. For example, both Oracle and Clark&Parsia are developing and selling information systems whose utility would be greatly enhanced by the algorithmic techniques that we expect to result from our work, and Alcatel-Lucent, BAE Systems, ExperienceOn, Kaiser Permanente, Lixto and Siemens are developing applications that would directly benefit from the availability of advanced information systems of the kind we will develop. In order to maximise this impact we will disseminate our results through distribution of software via the Web, presentations at relevant national and international meetings, and publications in leading conferences and journals. In addition, we will explore the possible commercialisation of our results; for example, both ExperienceOn and Clark & Parsia have expressed active an interest in exploiting the results of this project (see attached letters of support). Thus, our results will reach a broad cross section of computer science researchers and IT practitioners in both academia and industry. The proposers have an established track record of successful research and of impact on both the theory and practice of information technology: they have participated in numerous research projects, many of which have had significant impact on both research and industry, leading in several instances to exploitation of IP and/or commercial spin-offs; they have taken the lead in ensuring that ontology language standards are firmly based on foundational research; and they have been the recipients of several prestigious prizes and awards in recognition of their contributions to research.
 
Description Sources of semi-structured, overlapping, and semantically-related data on the Web are currently proliferating at a phenomenal rate, which has created a demand for more powerful and flexible information systems (ISs). This new generation of ISs will need to integrate incomplete and semi-structured information from heterogeneous sources, employ rich and flexible schemas, and answer queries by taking into account both knowledge and data.

Ontology-based data access has recently been proposed as an architectural principle for such systems. The main idea is to develop a unified view of the data by describing the relevant domain in an ontology, which then provides the vocabulary used to ask queries. The IS can use ontological statements, such as the concept hierarchy, to derive new facts and thus enrich query answers with implicit knowledge. This idea has been incorporated into systems such as QuOnto, Owlgres, ROWLKit, and REQUIEM, and ontology reasoners such as RACER, FaCT++, Pellet, and HermiT.

Such systems suffer from two main problems. First, the modelling capabilities of ontology languages are often insufficient for practical use cases. In order to achieve favourable computational properties, ontology languages are usually capable of describing only "tree-shaped" relationships; furthermore, (with some notable exceptions) they usually support only unary and binary predicates. Finally, ontology languages typically employ the open world assumption; however, when answering queries over large amounts of data, the closed world assumption (CWA) is often more appropriate.

Second, query answering facilities in existing ontology-based ISs typically do not scale to data sets commonly encountered in practice. Up to now, approaches to addressing this problem have focused on reducing the expressivity of the ontology language even further in order to obtain formal tractability guarantees. This obviously exacerbates the first problem (restricted modelling capabilities), while not necessarily delivering robust scalability in practice.

Database theory and practice can provide partial solutions to these problems. In databases, complex domains can be described using dependencies. Dependencies are used in a number of different ways: they are often used as integrity constraints--checks that verify whether a database instance includes all data specified in the domain description; however, dependencies can also be used similarly to ontologies to derive implicit knowledge. Treating dependencies as integrity constraints and answering queries under CWA has allowed practical relational database management systems (RDBMSs) to scale to very large data sets.

Database techniques alone do not, however, satisfy all the requirements for an ontology-based IS. In particular, dependencies often cannot model arbitrarily large structures and thus do not cover all practical modelling use cases. Furthermore, generalising the query answering techniques used in practical RDBMSs to the case where information deriving dependencies must be taken into account is still an open problem.

We therefore believe that the next generation of ontology-based ISs should be based on a synthesis and an extension ofontology and database systems and techniques, providing data handling capabilities similar to current RDBMSs, but with schemas that are rich, flexible, and tightly integrated with the data. In the ExODA project we have made several important steps in this direction, both theoretical and practical.

Theory:

We have studied the integration of description logics with tuple generating dependencies (TGDs) and ontology generating dependencies (EGDs); the resulting languages are undecidable in general, but we have developed novel acyclicity tests that allow us to identify ontologies for which reasoning procedures are guaranteed to terminate. We also tackled the problem of defining a well founded semantics (WFS) for such languages extended with negations in their bodies. In particular, we provide a WFS for the recent Datalog+/- family of ontology languages, which covers several important description logics (DLs).

We have developed a classification of OWL 2 QL ontology-mediated queries according to the succinctness (exponential vs polynomial) of first-order and nonrecursive datalog query rewritings. We have developed a tree-witness query rewriting for both conjunctive queries and OWL 2 QL ontologies, and SPARQL 1.1 queries under the OWL 2 QL Entailment Regime. We have also developed an alternative approach to ontology-based data access with OWL 2 QL, called the combined approach, which uses both query rewriting and data materialisation.

We have carried out fundamental studies into causes of computational hardness, and developed a new fixed-parameter tractability framework that allows for a much finer grained analysis of computational complexity.

Practice:

We have implemented the tree-witness query rewriting together with a variety of optimisations in the ontology-based data access platform Ontop (http://ontop.inf.unibz.it). The combined approach has been implemented by our partners at Bremen in the system Combo (https://code.google.com/p/combo-obda/).

We have implemented a prototype reasoner that performs automatic classification of chemical compounds from the ChEBI ontology using a formalism that combines features from both description logic and rule based ontology languages, and that allows for the representation of a wide range of chemical classes that are not expressible with description logic based formalisms such as OWL.

Impact:

The ontology-based data access platform Ontop is an open access system that is available at http://ontop.inf.unibz.it. Ontop is widely used in practice; in the EU FP7 project Optique (http://www.optique-project.eu), for example, it is being used for querying oil and gas data in the Norwegian oil company Statoil (http://sws.ifi.uio.no/project/npd-v2/).

We have published our results in leading international conferences and journals, including AIJ, JAIR, JAR, IEEE-IS, AAAI, KR, ISWC, LICS and PODS.
Exploitation Route The ontology-based data access platform Ontop is an open access system that is available at http://ontop.inf.unibz.it. Ontop is widely used in practice; in the EU FP7 project Optique (http://www.optique-project.eu), for example, it is being used for querying oil and gas data in the Norwegian oil company Statoil (http://sws.ifi.uio.no/project/npd-v2/).

We have published our results in leading international conferences and journals, including AIJ, JAIR, JAR, IEEE-IS, AAAI, KR, ISWC, LICS and PODS.
Sectors Digital/Communication/Information Technologies (including Software),Energy,Healthcare

URL http://www.cs.ox.ac.uk/isg/projects/ExODA/
 
Description Our results heavily influenced the ontology-based data access platform Ontop, an open access system that is available at http://ontop.inf.unibz.it. Ontop is widely used in practice; in the EU FP7 project Optique (http://www.optique-project.eu), for example, it is being used for querying oil and gas data in the Norwegian oil company Statoil (http://sws.ifi.uio.no/project/npd-v2/). We have published our results in leading international conferences and journals, including AIJ, JAIR, JAR, IEEE-IS, AAAI, KR, ISWC, LICS and PODS.
First Year Of Impact 2013
Sector Digital/Communication/Information Technologies (including Software),Energy
Impact Types Economic

 
Description EPSRC DBOnto
Amount £1,263,746 (GBP)
Funding ID EP/L012138/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 01/2014 
End 01/2019
 
Description FP7 Optique
Amount € 1,136,604 (EUR)
Funding ID 318338 
Organisation European Commission 
Department Seventh Framework Programme (FP7)
Sector Public
Country European Union (EU)
Start 11/2012 
End 11/2016
 
Description VADA
Amount £4,557,635 (GBP)
Funding ID EP/M025268/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 04/2015 
End 03/2020
 
Description EDF Exoda 
Organisation EDF Energy
Department EDF Innovation and Research
Country France 
Sector Private 
PI Contribution Helped to develop EDF energy management advisor using HermiT reasoner.
Collaborator Contribution Developed EDF energy management advisor using HermiT reasoner and evaluated it via deployment at EDF.
Impact EDF energy management advisor used to generate bespoke energy saving advice for EDF customers.
Start Year 2011