Score!: Scalable and Complete Reasoning with Incomplete Ontology Reasoners

Lead Research Organisation: University of Oxford
Department Name: Computer Science

Abstract

Decisions in industry, government and health care increasingly depend on improved access to and processing of digital information. This has led to a pressing demand for more powerful and flexible information systems. New generation information systems will need to efficiently process large data sets, exploit machine-readable domain knowledge, and answer queries by taking into account both knowledge and data.

Ontology-based information systems (OISs) constitute a rapidly maturing technology with the potential to meet these requirements. An ontology provides a vocabulary of terms that are familiar to the user, together with axioms describing the meaning of those terms. OISs can exploit the rich domain knowledge in an ontology to provide a unified view of the data and enrich query answers with implicit information using an automated reasoner.

Several standards for ontology and query languages have been developed, including RDF, OWL, OWL 2, and SPARQL. OWL and its revision OWL 2 provide a powerful and flexible ontology modelling language that can capture features such as class hierarchies, incomplete information, negative information, and so on. OWL ontologies are being used in an increasing range of applications, and are becoming a core technology for accessing, gathering, and sharing knowledge and data.

Applications involving large amounts of data, however, still pose serious challenges to the applicability of OISs. Problems in the applicability of OISs typically originate from conflicting application requirements.

- Modelling complex application domains requires rich ontology languages.
- Fine-grained access to information requires powerful query languages.
- Answering queries over large data sets requires scalable reasoners.
- Critical decisions that depend on access to information require query answers that are either complete, or where the incompleteness is well-understood.

Due to high worst-case complexity of the relevant reasoning problems, scalability is usually in conflict with the use of powerful ontology and query languages, and many applications give up completeness to achieve the desired scalability. As a result, existing OISs fail to meet one or more of these requirements: they support only weak ontology or query languages, they do not scale to the required volumes of data, or they do not provide guarantees as to the completeness of query answers.

Our goal in this project is to lay the foundations for a new generation of OISs that meet all the aforerementioned requirements, thus providing the ideal combination of expressive power, scalability and completeness.

To accomplish such an ambitious goal, we observe that the limitations imposed by the trade-offs between expressivity, scalability and completeness apply at the language level: that is, they involve worst-case complexity bounds for every ontology, query, and data set expressed in given ontology, query and data modeling languages. The class of ontologies, queries and data sets that are relevant to a particular application is, however, much more restricted. For example, although application data is often unknown or frequently changing, the ontology itself is fixed at design time, or changes infrequently. As a result, a reasoner known to be incomplete in general for given query and ontology languages might yield the same results as a complete reasoner for the application at hand. Identifying such cases is challenging, but it would have tremendous added value: applications could exploit scalable incomplete reasoners while still enjoying completeness guarantees, thus achieving 'the best of both worlds'.

We believe that our main goal can be accomplished by designing OISs that are optimised for the ontologies, queries and data sets relevant to the application at hand. Such OISs would maximise scalability while ensuring completeness of query answers, even for rich ontologies, large-scale data sets, and complex user queries.

Planned Impact

Our work is likely to have a major impact on the development of Ontology-based Information Systems (OISs): systems that use domain knowledge to facilitate the exploitation of large and complex data sources.

The use of highly scalable OISs has already been recognised as a great commercial opportunity, and technology vendors have started to augment their existing systems to exploit ontologies. For example, Oracle Inc. has recently enhanced its well-known database management system with modules that support semantic data management and which rely on ontological reasoning with large amounts of data. We are also aware of, and in many cases are working directly with, a large number of companies and other organisations who are investigating, developing or using OISs; these include Ordnance Survey, Siemens, IBM, Software AG, Kaiser Permanente, EDF Energy, Alcatel-Lucent, Samsung and Siemens. Other examples of (smaller) IT providers that are incorporating ontological reasoning components in their software include OpenLink, OntoText, Clark&Parsia, and ExperienceOn. These technology vendors will probably be the most direct beneficiaries of our research, with users of OISs benefiting from the resulting improvements in reliability and scalability.

In the long term, beneficiaries could include anyone who uses or depends on an ontology-based information system, from the medical doctor who uses the NHS patient record service to the average citizen who checks the BBC's website for the latest results in the World Cup. Note that both of these activities already exploit OISs, with the SNOMED ontology playing a central role in NHS information systems, and the BBC's 2010 World Cup web site creating pages dynamically from content stored in an OIS.

Concerning dissemination and engagement, we will disseminate the results of this research through the following channels:

- publications in leading conferences and journals in the fields of artificial intelligence, Web technologies, and databases;

- contacts with industry: we have well-established collaborations with numerous developers and users of OISs including Oracle, Samsung, Alcatel-Lucent, EDF, BAE Systems, Siemens, Kaiser Permanente, Clark&Parsia, and ExperienceOn;

- participation in relevant coordination and standardisation efforts within groups and organisations such as the (W3C) and the OWL Experiences and Directions Group (OWLED);

- distribution of software via the Web; and

- industry liaisons activities via the Host Institution, which includes talks, industry showcases, periodic newsletters, and an industry liaison website.

Concerning exploitation, the commercialisation of IP resulting from the project will be managed by Isis Innovation, a subsidiary of the University of Oxford, founded to exploit the outcomes of Oxford's research activities. The Host Institution is very proactive in communicating funding opportunities to its academics for exploiting research output via Isis and therefore extensive departmental support and assistance during a potential exploitation process is to be expected.

Possible ways of commercially exploiting our expected results through Isis include licensing innovative software and creating a new start-up company.

Finally, we have already participated in several research projects, many of which involved close interaction and collaboration with users and industrial partners. We also have extensive experience in standardisation efforts as well as in the organisation of workshops and other events which have gathered a significant number of users and practitioners from both academia and industry.

Publications

10 25 50
publication icon
Cuenca Grau B (2013) Acyclicity Notions for Existential Rules and Their Application to Query Answering in Ontologies in Journal of Artificial Intelligence Research

publication icon
Kaminski M (2013) Sufficient Conditions for First-Order and Datalog Rewritability in ELU in Proceedings of the 26th International Workshop on Description Logics (DL)

publication icon
Kaminski M (2014) Datalog Rewritability of Disjunctive Datalog Programs and its Applications to Ontology Reasoning in Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence

publication icon
Kaminski M (2013) Automated Deduction - CADE-24

publication icon
Kaminski M (2017) Complexity and Expressive Power of Weakly Well-Designed SPARQL in Theory of Computing Systems

publication icon
Kaminski M (2014) Datalog Rewriting Techniques for Non-Horn Ontologies in Proceedings of the 27th International Workshop on Description Logics (DL)

 
Description Ontology-based information systems (OISs) constitute a rapidly maturing technology that is increasingly being used in a wide range of application domains. An ontology provides a vocabulary of terms that are familiar to the user, together with axioms describing the meaning of those terms. OISs can exploit the rich domain knowledge in an ontology to provide a unified view of the data and enrich query answers with implicit information using an automated reasoner.

Applications involving large amounts of data, however, still pose serious challenges to the applicability of OISs.

The key finding of the project has been the development of hybrid approaches to reasoning and query answering which combine a scalable reasoner designed for a "lightweight" ontology language with a fully-fledged ontology reasoner.
The key feature of our approach is its `pay-as-you-go' behaviour:the bulk of the computational workload is delegated to the scalable reasoner, and the extent to which the fully-fledged reasoner is needed does not depend solely on the ontology, but on interactions between the ontology, the dataset and the query.

These techniques have proved very effective in practice and have been implemented in the reasoners PAGOdA (https://www.cs.ox.ac.uk/isg/tools/PAGOdA/) and MORe (https://www.cs.ox.ac.uk/isg/tools/MORe/).

As part of the project we have also developed novel techniques for equivalently rewriting ontologies expressed in a rich language into ontologies in a simpler language, which are amenable to efficient reasoning.
Exploitation Route Our findings can be exploited in any application of Semantic Technologies where rich ontologies are used and scalability is a concern. There are many such applications in domains such as the Clinical Sciences and the Oil and Energy industries.
Sectors Aerospace, Defence and Marine,Digital/Communication/Information Technologies (including Software),Energy,Healthcare,Culture, Heritage, Museums and Collections,Pharmaceuticals and Medical Biotechnology

URL https://www.cs.ox.ac.uk/isg/projects/Score/
 
Description A team of research scientists at ORACLE experimented with techniques developed in the project using their ORACLE's Semantic Graph tool (http://www.oracle.com/technetwork/database-options/spatialandgraph/overview/rdfsemantic-graph-1902016.html). The results of our collaboration with ORACLE were documented in a joint paper published at WWW 2013 conference, and stimulated further research in the area. SemFacet (https://www.cs.ox.ac.uk/isg/tools/SemFacet/) is a faceted search system supporting reasoning that was developed as part of the project. The commercialisation rights of SemFacet were acquired by the start-up company Oxford Semantic Technologies (https://www.oxfordsemantic.tech/). SemFacet is currently being re-implemented within the company with the aim of incorporating the system into their core product.
First Year Of Impact 2015
Sector Digital/Communication/Information Technologies (including Software)
Impact Types Economic

 
Description Collaboration with ORACLE 
Organisation Oracle Corporation
Department Oracle Corporation UK Ltd
Country United Kingdom 
Sector Private 
PI Contribution We integrated an earlier version of PAGOda with ORACLE technology and tested the system on their servers.
Collaborator Contribution ORACLE provided free licensing for their product and access to a supercomputer.
Impact We coauthored a paper with ORACE research scientists Yujiao Zhou, Bernardo Cuenca Grau, Ian Horrocks, Zhe Wu, Jay Banerjee: Making the most of your triple store: query answering in OWL 2 using an RL reasoner. WWW 2013: 1569-1580
Start Year 2013
 
Title MORe 
Description MORe is a prototypical reasoner for classification of ontologies written in the ontology language OWL 2. Given an OWL file, MORe computes the classification hierarchy entailed by the terminological part of the hierarchy. MORe integrates HermiT (a fully-fledged OWL 2 reasoner) with ELK (a reasoner for the OWL 2 EL profile) and RDFox (a datalog reasoner) in a modular way. In particular, MORe exploits module extraction techniques to identify a subsets of the ontology that can be completely classified using ELK or RDFox. MORe is designed in such a way that the fully-fledged (and slower) reasoner HermiT performs as few computations as possible, and the bulk of the computation is delegated to the more efficient, profile specific reasoners, ELK and RDFox. MORe is open-source and released under an academic license 
Type Of Technology Software 
Year Produced 2014 
Open Source License? Yes  
Impact Not applicable 
URL https://www.cs.ox.ac.uk/isg/tools/MORe/
 
Title PAGOdA 
Description PAGOdA is a fully-fledged query answering system for RDF data enhanced with expressive OWL ontologies. 
Type Of Technology Software 
Year Produced 2014 
Open Source License? Yes  
Impact The system has been released only recently, so no notable impact can be reported at this point. 
URL http://www.cs.ox.ac.uk/isg/tools/PAGOdA/
 
Title SemFacet 
Description SemFacet is a query formulation tool for RDF databases and OWL 2 ontologies based on the faceted search paradigm. An important feature of SemFacet is that it exploits state-of-the-art reasoning technology to update faceted query interfaces in response to user actions, as well as for computing search results. By exploiting the implicit structure of the ontology and data, SemFacet is capable of assisting end users in the formulation of meaningful queries that closely match their expectations. 
Type Of Technology Software 
Year Produced 2015 
Open Source License? Yes  
Impact SemFacet will be used in 2016 in an exploratory project funded with an EPSRC IAA account. The industrial partner in the project is EDF (Electricite de France) in Paris. 
URL http://www.cs.ox.ac.uk/isg/tools/SemFacet/
 
Company Name OXFORD SEMANTIC TECHNOLOGIES LIMITED 
Description Oxford Semantic Technologies combines expert know-how and the patented Oxford key technology to provide businesses with a tailored solution to access, process and analyse data. 
Year Established 2016 
Impact The company has a number of major customers and employs 6 full time engineers and research scientists
Website https://www.oxfordsemantic.tech/