iTract: Islands of Tractability in Ontology-Based Data Access

Lead Research Organisation: Birkbeck College
Department Name: Computer Science and Information Systems

Abstract

15 years ago most data was structured, complete, and neatly organised in databases. This is no longer the case. Unstructured, incomplete, and heterogeneous data sets are proliferating at an enormous rate. This is most evident in the context of the World Wide Web, but also applies to scientific data, data in business and industry, data in healthcare and in many other areas. To make use of such data, traditional information systems based on standard database technologies are no longer sufficient.

Ontology-based data access and management is a novel approach to address this challenge by introducing a semantic layer (ontology) that provides the user with a high-level unified view of the data as well as a vocabulary to access and query the data. Ontologies model application domains by providing machine readable definitions of terms and relationships between them. They are already used in numerous applications, for example, by the NHS: to enable communication between health professionals within the United Kingdom and worldwide, it is crucial that they use the same terminology; such a terminology is provided by the ontology SNOMED CT.

Using ontologies to access data and thereby directly combining data and knowledge is a novel idea of the 21st century. First applications have demonstrated that ontology-based data access and management is indeed feasible and has the potential to revolutionise modern information systems. However, scalability of query answering with expressive ontology languages remains a big challenge, and it is the aim of this project to develop a new "island of tractability" approach to tackle it. Our approach links ontology-based data access with two well-established and successful areas of Computer Science: constraint satisfaction and Boolean circuit complexity. We aim to transfer proof methods, techniques, and methodologies from these two areas to ontology-based data access. This includes a non-uniform complexity analysis, where we aim to classify the complexity of answering ontology-mediated queries, which consist of an ontology and a standard database query. Based on this complexity analysis, we will develop uniformly efficient query answering algorithms for the identified islands of tractable ontology-mediated queries, and implement them in the ontology-based data access systems Ontop and Combo. We will apply our novel technology to case studies from oil and gas industry and healthcare.

Publications

10 25 50

publication icon
Artale A. (2015) First-order rewritability of temporal ontology-mediated queries in IJCAI International Joint Conference on Artificial Intelligence

publication icon
Artale A. (2015) Tractable interval temporal propositional and description logics in Proceedings of the National Conference on Artificial Intelligence

publication icon
Botoeva E (2016) Games for query inseparability of description logic knowledge bases in Artificial Intelligence

publication icon
Botoeva E (2019) Query inseparability for ALC ontologies in Artificial Intelligence

publication icon
Botoeva E. (2016) Query-based entailment and inseparability for ALC ontologies in IJCAI International Joint Conference on Artificial Intelligence

publication icon
Brandt S (2018) Querying Log Data with Metric Temporal Logic in Journal of Artificial Intelligence Research

publication icon
Bresolin D (2017) Horn Fragments of the Halpern-Shoham Interval Temporal Logic in ACM Transactions on Computational Logic

 
Description We have discovered and investigated many new and important islands of tractability in ontology-mediated query answering, including the following:

(1) We have considered the case of ontology-mediated querying with expressive data types such as the integers, the rational numbers, or related spatial and temporal data types. Using recent results on P/NP dichotomies for temporal constraint satisfaction problems, we obtained P/coNP dichotomies for ontology-mediated querying with datatypes. Moreover, in many cases, membership to the tractable class is decidable. Sometimes this can even be done using a straightforward syntactic check. This work was published, for example, in AAAI 2017.

(2) We considered ontologies over the guarded fragment of first-order logic and determined very expressive fragments for which there exists a P/NP dichotomy for ontology-mediated query answering. In many practically relevant cases we obtained NExpTime or ExpTime decision procedures for deciding whether an ontology-mediated query is tractable, thus identifying important relevant classes of queries for which PTime querying is possible. The also proved dichotomies between datalog-rewritable and coNP-hard. This works received the Best Paper Award at PODS 2017.

We investigated the relationship between ontology-mediated query answering using unions of conjunctive queries and ontology-mediated query answering using SPARQL queries. We developed criteria and decision procedures when the former can be reduced to the later type of queries. This research is practically relevant as many implemented systems are based on SPARQL queries. Work on this received the Distinguished Paper Award at IJCAI 2018.

We investigated the question whether all tractable ontology-mediated queries can be rewritten into queries based on Horn ontologies, presenting both positive and negative results. We also gave decision procedures for containment and first-order rewritability of ontology-mediated queries over Horn ontologies. This work was presented at IJCAI 2016 and IJCAI 2018.

(3) We gave solutions to two fundamental computational problems in ontology-based data access with the W3C standard ontology language OWL2QL: the succinctness problem for first-order rewritings of ontology-mediated queries, and the complexity problem for ontology-mediated query answering. We classified ontology-mediated queries according to the shape of their conjunctive queries (treewidth, the number of leaves) and the existential depth of their ontologies. For each of these classes, we determined the combined complexity of ontology-mediated query answering, and whether all ontology-mediated queries in the class have polynomial-size first-order, positive existential, and nonrecursive data- log rewritings. We obtain the succinctness results using hypergraph programs, a new computational model for Boolean functions, which makes it possible to connect the size of ontology-mediated query rewritings and circuit complexity. This work was published in LICS 2014, 2015 and the Journal of ACM 2018.

We extended this analysis to ontology-mediated queries with sets of linear tgds and conjunctive queries of bounded hypertree width. We also investigated parameterised complexity of answering tree-shaped ontology-mediated queries in OWL 2 QL under various restrictions on their ontologies and conjunctive queries. In particular, we construct an ontology T such that answering ontology-mediated queries (T,q) with tree-shaped query q is W[1]-hard if the number of leaves in q is regarded as the parameter. The number of leaves has previously been identified as an important characteristic of conjunctive queries as bounding it leads to tractable ontology-mediated query answering. This work was presented at PODS 2017.

(4) We have investigated islands of tractability in temporal ontology-based data access with the linear temporal logic LTL, Halpern-Shoham interval temporal logic HS and metric temporal logic MTL. This work was published in JAIR 2018, ACM TOCL 2017 and presented at IJCAI 2016, AAAI 2017, TIME 2017.
Exploitation Route We expect that our findings will be used in ontology-based query answering settings in both academia and industry.
Sectors Digital/Communication/Information Technologies (including Software),Education,Energy,Healthcare,Culture, Heritage, Museums and Collections,Other

 
Description Modern organisations accumulate vast amounts of data, stored in multiple and complex databases. Extracting data is a time-consuming and onerous process, especially for non-IT specialists. Virtual Knowledge Graphs (VKGs) provide users with a search vocabulary that facilitates information extraction without relying on IT specialists, leading to cost/efficiency savings and opening up data repositories to data analytics. Our research research underpins the reasoning algorithms in the VKG system Ontop, which has applications across a wide range of sectors including energy, healthcare, education and innovation. Ontop is available open-source on Github (github.com/ontop/ontop) and has become 'one of the leading Virtual Knowledge Graph systems worldwide'. It has been bundled with downloads of Stanford University's Protégé, an ontology development platform with over 366,000 users (Dec 2020). In April 2019, UniBZ spun out the Ontop work into a start-up company, Ontopic s.r.l., which now employs three full-time staff who work alongside UniBZ academics to develop tailored commercial solutions based on the Ontop framework. In addition to the economic benefit to those employed, the development of spin-out companies such as Ontopic serves UniBZ's goals as an institution: 'technology and knowledge transfer is the third pillar of the university... joint projects ensure the practical relevance of research and education'. (www.unibz.it/en/home/companies-and-partnerships/knowledge-technology-transfer). Together with the Norwegian SIRIUS Centre for Scalable Data Access in the Oil and Gas Domain, we worked with the multinational energy firm Equinor (formerly Statoil) and with German industrial manufacturing conglomerate Siemens to develop exemplar VKG tools, in which Ontop was a core component. In Brazil, Ontop forms the basis for a VKG system named Recruit, implemented at the AC Camargo Cancer Centre in São Paulo. VKGs have proven useful in providing access to open data repositories which facilitate the smoother running of regional infrastructure. Ontopic's ongoing projects in this sector include an €80,000 collaboration with the Italian province of South Tyrol to extend their tourism open data portal. In Spain, SIRIS Academic (a consultancy and think-tank based in Barcelona which employs over 30 staff ) has drawn on Ontop's VKG technology to provide information solutions for its clients, describing Ontop as 'indispensable' to its work. SIRIS's initial work with Ontop came in the context of EPNet, a €2,400,000 ERC-funded project that integrated three Roman archaeological databases into a user-friendly interface allowing scholars to easily run searches across them. Notably, Ontop underpins SIRIS's UNiCS (unics.cloud), 'an Open Data platform based on semantic technologies that integrates an ever-growing number of repositories and datasets about the higher education, research and innovation sector in Europe'. 'Approximately 10,000 users each year from institutions including universities, local governments, and regional agencies responsible for research and development' use UNiCS (and the customised portals built from it) to better understand their operating context, allowing them to make informed strategic decisions for the future. SIRIS also uses UNiCS as the basis for customised data mining applications and strategic solutions for its clients. The Ontop system is also at the core of the BT Hypercat Data Hub and in the DALI project at IBM Ireland.
First Year Of Impact 2016
Sector Digital/Communication/Information Technologies (including Software),Education,Energy,Culture, Heritage, Museums and Collections,Other
Impact Types Cultural,Economic,Policy & public services

 
Description University of Oslo 
Organisation University of Oslo
Country Norway 
Sector Academic/University 
PI Contribution Developing the ontology-based data access system Ontop
Collaborator Contribution Developing the ontology-based data access system Ontop
Impact Developing the ontology-based data access system Ontop
Start Year 2015