Score!: Scalable and Complete Reasoning with Incomplete Ontology Reasoners

Lead Research Organisation: University of Oxford

Department Name: Computer Science

Abstract

Decisions in industry, government and health care increasingly depend on improved access to and processing of digital information. This has led to a pressing demand for more powerful and flexible information systems. New generation information systems will need to efficiently process large data sets, exploit machine-readable domain knowledge, and answer queries by taking into account both knowledge and data.

Ontology-based information systems (OISs) constitute a rapidly maturing technology with the potential to meet these requirements. An ontology provides a vocabulary of terms that are familiar to the user, together with axioms describing the meaning of those terms. OISs can exploit the rich domain knowledge in an ontology to provide a unified view of the data and enrich query answers with implicit information using an automated reasoner.

Several standards for ontology and query languages have been developed, including RDF, OWL, OWL 2, and SPARQL. OWL and its revision OWL 2 provide a powerful and flexible ontology modelling language that can capture features such as class hierarchies, incomplete information, negative information, and so on. OWL ontologies are being used in an increasing range of applications, and are becoming a core technology for accessing, gathering, and sharing knowledge and data.

Applications involving large amounts of data, however, still pose serious challenges to the applicability of OISs. Problems in the applicability of OISs typically originate from conflicting application requirements.

- Modelling complex application domains requires rich ontology languages.
- Fine-grained access to information requires powerful query languages.
- Answering queries over large data sets requires scalable reasoners.
- Critical decisions that depend on access to information require query answers that are either complete, or where the incompleteness is well-understood.

Due to high worst-case complexity of the relevant reasoning problems, scalability is usually in conflict with the use of powerful ontology and query languages, and many applications give up completeness to achieve the desired scalability. As a result, existing OISs fail to meet one or more of these requirements: they support only weak ontology or query languages, they do not scale to the required volumes of data, or they do not provide guarantees as to the completeness of query answers.

Our goal in this project is to lay the foundations for a new generation of OISs that meet all the aforerementioned requirements, thus providing the ideal combination of expressive power, scalability and completeness.

To accomplish such an ambitious goal, we observe that the limitations imposed by the trade-offs between expressivity, scalability and completeness apply at the language level: that is, they involve worst-case complexity bounds for every ontology, query, and data set expressed in given ontology, query and data modeling languages. The class of ontologies, queries and data sets that are relevant to a particular application is, however, much more restricted. For example, although application data is often unknown or frequently changing, the ontology itself is fixed at design time, or changes infrequently. As a result, a reasoner known to be incomplete in general for given query and ontology languages might yield the same results as a complete reasoner for the application at hand. Identifying such cases is challenging, but it would have tremendous added value: applications could exploit scalable incomplete reasoners while still enjoying completeness guarantees, thus achieving 'the best of both worlds'.

We believe that our main goal can be accomplished by designing OISs that are optimised for the ontologies, queries and data sets relevant to the application at hand. Such OISs would maximise scalability while ensuring completeness of query answers, even for rich ontologies, large-scale data sets, and complex user queries.

Planned Impact

Our work is likely to have a major impact on the development of Ontology-based Information Systems (OISs): systems that use domain knowledge to facilitate the exploitation of large and complex data sources.

The use of highly scalable OISs has already been recognised as a great commercial opportunity, and technology vendors have started to augment their existing systems to exploit ontologies. For example, Oracle Inc. has recently enhanced its well-known database management system with modules that support semantic data management and which rely on ontological reasoning with large amounts of data. We are also aware of, and in many cases are working directly with, a large number of companies and other organisations who are investigating, developing or using OISs; these include Ordnance Survey, Siemens, IBM, Software AG, Kaiser Permanente, EDF Energy, Alcatel-Lucent, Samsung and Siemens. Other examples of (smaller) IT providers that are incorporating ontological reasoning components in their software include OpenLink, OntoText, Clark&Parsia, and ExperienceOn. These technology vendors will probably be the most direct beneficiaries of our research, with users of OISs benefiting from the resulting improvements in reliability and scalability.

In the long term, beneficiaries could include anyone who uses or depends on an ontology-based information system, from the medical doctor who uses the NHS patient record service to the average citizen who checks the BBC's website for the latest results in the World Cup. Note that both of these activities already exploit OISs, with the SNOMED ontology playing a central role in NHS information systems, and the BBC's 2010 World Cup web site creating pages dynamically from content stored in an OIS.

Concerning dissemination and engagement, we will disseminate the results of this research through the following channels:

- publications in leading conferences and journals in the fields of artificial intelligence, Web technologies, and databases;

- contacts with industry: we have well-established collaborations with numerous developers and users of OISs including Oracle, Samsung, Alcatel-Lucent, EDF, BAE Systems, Siemens, Kaiser Permanente, Clark&Parsia, and ExperienceOn;

- participation in relevant coordination and standardisation efforts within groups and organisations such as the (W3C) and the OWL Experiences and Directions Group (OWLED);

- distribution of software via the Web; and

- industry liaisons activities via the Host Institution, which includes talks, industry showcases, periodic newsletters, and an industry liaison website.

Concerning exploitation, the commercialisation of IP resulting from the project will be managed by Isis Innovation, a subsidiary of the University of Oxford, founded to exploit the outcomes of Oxford's research activities. The Host Institution is very proactive in communicating funding opportunities to its academics for exploiting research output via Isis and therefore extensive departmental support and assistance during a potential exploitation process is to be expected.

Possible ways of commercially exploiting our expected results through Isis include licensing innovative software and creating a new start-up company.

Finally, we have already participated in several research projects, many of which involved close interaction and collaboration with users and industrial partners. We also have extensive experience in standardisation efforts as well as in the organisation of workshops and other events which have gathered a significant number of users and practitioners from both academia and industry.

Funded Value:

£555,707

Funded Period:

Jan 13 - Jan 16

Funder:

EPSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

EP/J020214/1

Principal Investigator:

Bernardo Cuenca Grau

Research Subject:

Info. & commun. Technol. (100%)

Research Topic:

Information & Knowledge Mgmt (100%)

Organisations

People	ORCID iD
Bernardo Cuenca Grau (Principal Investigator)
Boris Motik (Co-Investigator)	http://orcid.org/0000-0003-2506-4118
Ian Horrocks (Co-Investigator)

Publications

Author Name

Title Publication Date Published

|< < 1 2 3 > >|

10 25 50

Cuenca Grau B (2013) Acyclicity Notions for Existential Rules and Their Application to Query Answering in Ontologies in Journal of Artificial Intelligence Research

Cuenca Grau B (2016) Logical Foundations of Privacy-Preserving Publishing of Linked Data

Cuenca Grau B (2015) Controlled Query Evaluation for Datalog and OWL 2 Profile Ontologies

Feier C (2015) The Combined Approach to Query Answering Beyond the OWL 2 Profiles

Kaminski M (2013) Sufficient Conditions for First-Order and Datalog Rewritability in ELU in Proceedings of the 26th International Workshop on Description Logics (DL)

Kaminski M (2014) Datalog Rewritability of Disjunctive Datalog Programs and its Applications to Ontology Reasoning in Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence

Kaminski M (2013) Automated Deduction - CADE-24

Kaminski M (2017) Complexity and Expressive Power of Weakly Well-Designed SPARQL in Theory of Computing Systems

Kaminski M (2014) Datalog Rewriting Techniques for Non-Horn Ontologies in Proceedings of the 27th International Workshop on Description Logics (DL)

Kaminski M (2016) Semantics and Expressive Power of Subqueries and Aggregates in SPARQL 1.1

Key Findings
Impact Summary
Collaboration
Software and Technical Products
Spin Outs


Description	Ontology-based information systems (OISs) constitute a rapidly maturing technology that is increasingly being used in a wide range of application domains. An ontology provides a vocabulary of terms that are familiar to the user, together with axioms describing the meaning of those terms. OISs can exploit the rich domain knowledge in an ontology to provide a unified view of the data and enrich query answers with implicit information using an automated reasoner. Applications involving large amounts of data, however, still pose serious challenges to the applicability of OISs. The key finding of the project has been the development of hybrid approaches to reasoning and query answering which combine a scalable reasoner designed for a "lightweight" ontology language with a fully-fledged ontology reasoner. The key feature of our approach is its `pay-as-you-go' behaviour:the bulk of the computational workload is delegated to the scalable reasoner, and the extent to which the fully-fledged reasoner is needed does not depend solely on the ontology, but on interactions between the ontology, the dataset and the query. These techniques have proved very effective in practice and have been implemented in the reasoners PAGOdA (https://www.cs.ox.ac.uk/isg/tools/PAGOdA/) and MORe (https://www.cs.ox.ac.uk/isg/tools/MORe/). As part of the project we have also developed novel techniques for equivalently rewriting ontologies expressed in a rich language into ontologies in a simpler language, which are amenable to efficient reasoning.
Exploitation Route	Our findings can be exploited in any application of Semantic Technologies where rich ontologies are used and scalability is a concern. There are many such applications in domains such as the Clinical Sciences and the Oil and Energy industries.
Sectors	Aerospace, Defence and Marine,Digital/Communication/Information Technologies (including Software),Energy,Healthcare,Culture, Heritage, Museums and Collections,Pharmaceuticals and Medical Biotechnology
URL	https://www.cs.ox.ac.uk/isg/projects/Score/


Description	A team of research scientists at ORACLE experimented with techniques developed in the project using their ORACLE's Semantic Graph tool (http://www.oracle.com/technetwork/database-options/spatialandgraph/overview/rdfsemantic-graph-1902016.html). The results of our collaboration with ORACLE were documented in a joint paper published at WWW 2013 conference, and stimulated further research in the area. SemFacet (https://www.cs.ox.ac.uk/isg/tools/SemFacet/) is a faceted search system supporting reasoning that was developed as part of the project. The commercialisation rights of SemFacet were acquired by the start-up company Oxford Semantic Technologies (https://www.oxfordsemantic.tech/). SemFacet is currently being re-implemented within the company with the aim of incorporating the system into their core product.
First Year Of Impact	2015
Sector	Digital/Communication/Information Technologies (including Software)
Impact Types	Economic


Description	Collaboration with ORACLE
Organisation	Oracle Corporation
Department	Oracle Corporation UK Ltd
Country	United Kingdom
Sector	Private
PI Contribution	We integrated an earlier version of PAGOda with ORACLE technology and tested the system on their servers.
Collaborator Contribution	ORACLE provided free licensing for their product and access to a supercomputer.
Impact	We coauthored a paper with ORACE research scientists Yujiao Zhou, Bernardo Cuenca Grau, Ian Horrocks, Zhe Wu, Jay Banerjee: Making the most of your triple store: query answering in OWL 2 using an RL reasoner. WWW 2013: 1569-1580
Start Year	2013


Title	MORe
Description	MORe is a prototypical reasoner for classification of ontologies written in the ontology language OWL 2. Given an OWL file, MORe computes the classification hierarchy entailed by the terminological part of the hierarchy. MORe integrates HermiT (a fully-fledged OWL 2 reasoner) with ELK (a reasoner for the OWL 2 EL profile) and RDFox (a datalog reasoner) in a modular way. In particular, MORe exploits module extraction techniques to identify a subsets of the ontology that can be completely classified using ELK or RDFox. MORe is designed in such a way that the fully-fledged (and slower) reasoner HermiT performs as few computations as possible, and the bulk of the computation is delegated to the more efficient, profile specific reasoners, ELK and RDFox. MORe is open-source and released under an academic license
Type Of Technology	Software
Year Produced	2014
Open Source License?	Yes
Impact	Not applicable
URL	https://www.cs.ox.ac.uk/isg/tools/MORe/


Title	PAGOdA
Description	PAGOdA is a fully-fledged query answering system for RDF data enhanced with expressive OWL ontologies.
Type Of Technology	Software
Year Produced	2014
Open Source License?	Yes
Impact	The system has been released only recently, so no notable impact can be reported at this point.
URL	http://www.cs.ox.ac.uk/isg/tools/PAGOdA/


Title	SemFacet
Description	SemFacet is a query formulation tool for RDF databases and OWL 2 ontologies based on the faceted search paradigm. An important feature of SemFacet is that it exploits state-of-the-art reasoning technology to update faceted query interfaces in response to user actions, as well as for computing search results. By exploiting the implicit structure of the ontology and data, SemFacet is capable of assisting end users in the formulation of meaningful queries that closely match their expectations.
Type Of Technology	Software
Year Produced	2015
Open Source License?	Yes
Impact	SemFacet will be used in 2016 in an exploratory project funded with an EPSRC IAA account. The industrial partner in the project is EDF (Electricite de France) in Paris.
URL	http://www.cs.ox.ac.uk/isg/tools/SemFacet/


Company Name	OXFORD SEMANTIC TECHNOLOGIES LIMITED
Description	Oxford Semantic Technologies combines expert know-how and the patented Oxford key technology to provide businesses with a tailored solution to access, process and analyse data.
Year Established	2016
Impact	The company has a number of major customers and employs 6 full time engineers and research scientists
Website	https://www.oxfordsemantic.tech/

Abstract

Planned Impact

Organisations

People

ORCID iD

Publications