quantMD: Ontology-Based Management of Many-Dimensional Quantitative Data

Lead Research Organisation: University of Liverpool

Department Name: Computer Science

Abstract

Ontology-based data management (OBDM) is a technology that has been developed over the past decade with the aim of facilitating access to various types of data sources. In general, ontologies provide a formal model and vocabulary for a domain of interest. In OBDM, the role of ontologies is threefold: to integrate distributed and heterogeneous data sources, enrich incomplete data with background knowledge, and provide a user-friendly language for querying.

To illustrate, in an energy company the traditional workflow for geologists to find answers to their information needs is to either execute pre-defined queries covering parts of the needs over their databases and then integrate the results manually, which is onerous and error-prone, or to ask the IT department to construct custom SQL queries, which may takes days or even weeks. OBDM reduces the time for finding answers to minutes by allowing the geologists to formulate their queries in natural-language terms and then run these queries via the OBDM tools over their databases.

Thus, by bringing together knowledge representation and database technologies, OBDM has the potential to transform information systems by allowing domain experts to query complex and distributed data efficiently without the help of database professionals.

This project addresses the main bottleneck in the way to realise this potential: so far, OBDM has been developed primarily for access to purely qualitative and one-dimensional data, but nowadays data is mostly numerical, many-dimensional, often temporal, and user information needs usually involve quantitative analysis. Thus, quantitative queries such as "find all UK-sponsored research institutions in Europe whose total triennial financial contributions from UK-based private companies exceeds euro 10M" are not supported at all by existing OBDM tools. Moreover, because of the so-called open world assumption made in OBDM, developing the theory and practical tools for dealing with such queries is extremely challenging.

The aim of this project is to develop a novel OBDM framework for querying and analysing many-dimensional numerical data. To address the challenges, we bring together techniques from databases, knowledge representation, and formal methods, in particular temporal and modal logics, and develop these further. We will develop a theoretical framework for querying such data, develop tools for using this framework in practice, and test our tools with partners from industry and the public sector.

Planned Impact

The practical aim of this project is to build a novel ontology-based data management (OBDM) tool that will facilitate querying and analysing multi-dimensional numerical data from various types of data sources. The tool will be based on the existing state-of-the-art OBDM platform Ontop (developed at the Free University of Bozen-Bolzano in collaboration with Birkbeck), which at the moment provides access to qualitative data stored in relational databases, but does not support querying and aggregating numerical data (such as, for example, seismic data, timestamped sensor measurements, production data or medical measurements and tests) and integrating different types of data sources, in particular, APIs. The tool will be designed and implemented in close collaboration with a range of project partners from industry (Equinor, Lundin and Siemens), small/medium business (SIRIS Academic) and public sector (Josef Pilsudski Institute of America). Thus, we can realistically expect that our novel technology will be taken up quickly and implemented by the partners, and further by other companies, organisations and institutions where decision making depends on querying and analysing data. In particular, we expect applications of our tool in healthcare, where semantic technologies are already having an impact and where many-dimensional and numerical OBDM could help medical experts in identifying diseases and suitable treatment based on high temporal resolution data including lab results, electronic documentation, and bedside monitor trends.

The impact of our technology and tool will be to allow non-IT-expert end-users to efficiently retrieve complete data from multiple data sources without any assistance from database specialists, which will significantly simplify and speed-up the process of data gathering. Along with the monetary savings and improvement in efficiency, this technology can generate significantly higher value by freeing domain experts' time to focus on their core tasks of data evaluation and analysis rather than spending time on searching and gathering data and information.

Another impact of this project is the promotion of new semantic technologies and standards among various companies and users. The seminars and tutorials that we will be delivered at our partners and conferences will contribute towards this goal.

Funded Value:

£402,689

Funded Period:

Oct 19 - Sep 22

Funder:

EPSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

EP/S032207/1

Principal Investigator:

Frank Wolter

Research Subject:

Info. & commun. Technol. (100%)

Research Topic:

Information & Knowledge Mgmt (100%)

Organisations

People	ORCID iD
Frank Wolter (Principal Investigator)
Boris Konev (Co-Investigator)
Martin Zimmermann (Co-Investigator)

Publications

Author Name Title

Publication Date Published

|< < 1 2 > >|

10 25 50

Kontchakov R (2020) Boolean Role Inclusions in DL-Lite With and Without Time

Hernich A (2020) Dichotomies in Ontology-Mediated Querying with the Guarded Fragment in ACM Transactions on Computational Logic

Artale A (2022) First-Order Rewritability and Complexity of Two-Dimensional Temporal Ontology-Mediated Queries in Journal of Artificial Intelligence Research

Artale A (2021) First-order rewritability of ontology-mediated queries in linear temporal logic in Artificial Intelligence

Mascle C (2020) From LTL to rLTL monitoring

Mascle C (2022) From LTL to rLTL monitoring: improved monitorability through robust semantics in Formal Methods in System Design

Haga A (2021) How to Approximate Ontology-Mediated Queries

Fortin M (2022) Interpolants and Explicit Definitions in Extensions of the Description Logic EL

Artale A (2023) Living without Beth and Craig: Definitions and Interpolants in Description and Modal Logics with Nominals and Role Inclusions in ACM Transactions on Computational Logic

Jung J (2021) Living without Beth and Craig: Definitions and Interpolants in the Guarded and Two-Variable Fragments

Key Findings
Impact Summary


Description	We have made significant progress towards our goal of extending standard static ontology-based data access to data with a temporal dimension. We started by developing a framework for querying one-dimensional temporal data that can represent the temporal evolution of a single object. We take into account brackground knowledge formulated in an ontology. We proposed to use ontologies given in linear temporal logic, LTL, which has been invented in philosophy and has been successfully applied in computer science in the area of program verification. Queries are also given in the positive fragment of LTL. Within this framework, we investigated the complexity and rewritability to standard relational queries of ontology-mediated queries. By taking account of the expressivity of the temporal operators used in the ontology and the shape of the queries, we identified a hierarchy of more and more powerful ontology-mediated queries and proved rewritability into either standard database queries, such queries extended by standard arithmetic predicates, and further extensions with primitive recursion. We have thus laid the foundation for practical ontology-based access to one-dimensional temporal data. In a second step we extended our framework for querying one-dimensional temporal data to querying two-dimensional data in which each timestamp comes with a database of facts true at that timestamp. We propose to model the second dimension using the description logic underpinning the OWL profile for static one-dimensional data access. Within this framework we prove powerful transfer results that lift our complexity and rewritability results from the one-dimensional to the two-dimensional case. Using these transfer results we obtain again a hierarchy of more and more powerful two-dimensional ontology-mediated queries which combine fragments of LTL with description logic. We have also obtained new results on approximate query answering under ontologies (which won the Ray Reiter Runner-Up Best Paper Award at KR 2021) and learning queries using examples under ontologies (which won the Ray Reiter Runner-Up Best Paper Award at KR 2020). Both results directly contribute to ontology-based data access to many-dimensional data.
Exploitation Route	Our results lay the foundations for querying many-dimensional data modulo ontologies. They have already partly been used to support aggregate queries and temporal datatypes in the commercial system ontop. Our new techniques might be applied in the future to analyse even more expressive ontology and query languages. They might also be used to develop query answering algorithms for even more expressive languages and implement them in systems.
Sectors	Aerospace, Defence and Marine,Digital/Communication/Information Technologies (including Software),Energy


Description	Our research has contributed to the development of the commercial system Ontop. Ontop is a Virtual Knowledge Graph system which exposes the content of arbitrary relational databases as ontologies or knowledge graphs. These ontolgies/graphs are virtual, which means that data remains in the data sources instead of being moved to another database. Since 2019, Ontop is developed and commercialised by the start-up company Ontopic which is based in Bolzano, Italy. Our research in this project contributed, for instance, to the addition of aggregate queries and datatypes for modelling temporal data to Ontop.
First Year Of Impact	2021
Sector	Digital/Communication/Information Technologies (including Software)
Impact Types	Economic

Abstract

Planned Impact

Organisations

People

ORCID iD

Publications