quantMD: Ontology-Based Management of Many-Dimensional Quantitative Data

Lead Research Organisation: University of Liverpool
Department Name: Computer Science

Abstract

Ontology-based data management (OBDM) is a technology that has been developed over the past decade with the aim of facilitating access to various types of data sources. In general, ontologies provide a formal model and vocabulary for a domain of interest. In OBDM, the role of ontologies is threefold: to integrate distributed and heterogeneous data sources, enrich incomplete data with background knowledge, and provide a user-friendly language for querying.

To illustrate, in an energy company the traditional workflow for geologists to find answers to their information needs is to either execute pre-defined queries covering parts of the needs over their databases and then integrate the results manually, which is onerous and error-prone, or to ask the IT department to construct custom SQL queries, which may takes days or even weeks. OBDM reduces the time for finding answers to minutes by allowing the geologists to formulate their queries in natural-language terms and then run these queries via the OBDM tools over their databases.

Thus, by bringing together knowledge representation and database technologies, OBDM has the potential to transform information systems by allowing domain experts to query complex and distributed data efficiently without the help of database professionals.

This project addresses the main bottleneck in the way to realise this potential: so far, OBDM has been developed primarily for access to purely qualitative and one-dimensional data, but nowadays data is mostly numerical, many-dimensional, often temporal, and user information needs usually involve quantitative analysis. Thus, quantitative queries such as "find all UK-sponsored research institutions in Europe whose total triennial financial contributions from UK-based private companies exceeds euro 10M" are not supported at all by existing OBDM tools. Moreover, because of the so-called open world assumption made in OBDM, developing the theory and practical tools for dealing with such queries is extremely challenging.

The aim of this project is to develop a novel OBDM framework for querying and analysing many-dimensional numerical data. To address the challenges, we bring together techniques from databases, knowledge representation, and formal methods, in particular temporal and modal logics, and develop these further. We will develop a theoretical framework for querying such data, develop tools for using this framework in practice, and test our tools with partners from industry and the public sector.

Planned Impact

The practical aim of this project is to build a novel ontology-based data management (OBDM) tool that will facilitate querying and analysing multi-dimensional numerical data from various types of data sources. The tool will be based on the existing state-of-the-art OBDM platform Ontop (developed at the Free University of Bozen-Bolzano in collaboration with Birkbeck), which at the moment provides access to qualitative data stored in relational databases, but does not support querying and aggregating numerical data (such as, for example, seismic data, timestamped sensor measurements, production data or medical measurements and tests) and integrating different types of data sources, in particular, APIs. The tool will be designed and implemented in close collaboration with a range of project partners from industry (Equinor, Lundin and Siemens), small/medium business (SIRIS Academic) and public sector (Josef Pilsudski Institute of America). Thus, we can realistically expect that our novel technology will be taken up quickly and implemented by the partners, and further by other companies, organisations and institutions where decision making depends on querying and analysing data. In particular, we expect applications of our tool in healthcare, where semantic technologies are already having an impact and where many-dimensional and numerical OBDM could help medical experts in identifying diseases and suitable treatment based on high temporal resolution data including lab results, electronic documentation, and bedside monitor trends.

The impact of our technology and tool will be to allow non-IT-expert end-users to efficiently retrieve complete data from multiple data sources without any assistance from database specialists, which will significantly simplify and speed-up the process of data gathering. Along with the monetary savings and improvement in efficiency, this technology can generate significantly higher value by freeing domain experts' time to focus on their core tasks of data evaluation and analysis rather than spending time on searching and gathering data and information.

Another impact of this project is the promotion of new semantic technologies and standards among various companies and users. The seminars and tutorials that we will be delivered at our partners and conferences will contribute towards this goal.
 
Description We have made significant progress towards our goal of extending standard static ontology-based data access to data with a temporal dimension. We started by developing a framework for querying one-dimensional temporal data that can represent the temporal evolution of a single object. We take into account brackground knowledge formulated in an ontology. We proposed to use ontologies given in linear temporal logic, LTL, which has been invented in philosophy and has been successfully applied in computer science in the area of program verification. Queries are also given in the positive fragment of LTL. Within this framework, we investigated the complexity and rewritability to standard relational queries of ontology-mediated queries. By taking account of the expressivity of the temporal operators used in the ontology and the shape of the queries, we identified a hierarchy of more and more powerful ontology-mediated queries and proved rewritability into either standard database queries, such queries extended by standard arithmetic predicates, and further extensions with primitive recursion. We have thus laid the foundation for practical ontology-based access to one-dimensional temporal data.

In a second step we extended our framework for querying one-dimensional temporal data to querying two-dimensional data in which each timestamp comes with a database of facts true at that timestamp. We propose to model the second dimension using the description logic underpinning the OWL profile for static one-dimensional data access. Within this framework we prove powerful transfer results that lift our complexity and rewritability results from the one-dimensional to the two-dimensional case. Using these transfer results we obtain again a hierarchy of more and more powerful two-dimensional ontology-mediated queries which combine fragments of LTL with description logic.

We have also obtained new results on approximate query answering under ontologies (which won the Ray Reiter Runner-Up Best Paper Award at KR 2021) and learning queries using examples under ontologies (which won the Ray Reiter Runner-Up Best Paper Award at KR 2020). Both results directly contribute to ontology-based data access to many-dimensional data.
Exploitation Route Our results lay the foundations for querying many-dimensional data modulo ontologies. They have already partly been used to support aggregate queries and temporal datatypes in the commercial system ontop. Our new techniques might be applied in the future to analyse even more expressive ontology and query languages. They might also be used to develop query answering algorithms for even more expressive languages and implement them in systems.
Sectors Aerospace, Defence and Marine,Digital/Communication/Information Technologies (including Software),Energy

 
Description Our research has contributed to the development of the commercial system Ontop. Ontop is a Virtual Knowledge Graph system which exposes the content of arbitrary relational databases as ontologies or knowledge graphs. These ontolgies/graphs are virtual, which means that data remains in the data sources instead of being moved to another database. Since 2019, Ontop is developed and commercialised by the start-up company Ontopic which is based in Bolzano, Italy. Our research in this project contributed, for instance, to the addition of aggregate queries and datatypes for modelling temporal data to Ontop.
First Year Of Impact 2021
Sector Digital/Communication/Information Technologies (including Software)
Impact Types Economic