quantMD: Ontology-Based Management of Many-Dimensional Quantitative Data

Lead Research Organisation: University of Liverpool
Department Name: Computer Science

Abstract

Ontology-based data management (OBDM) is a technology that has been developed over the past decade with the aim of facilitating access to various types of data sources. In general, ontologies provide a formal model and vocabulary for a domain of interest. In OBDM, the role of ontologies is threefold: to integrate distributed and heterogeneous data sources, enrich incomplete data with background knowledge, and provide a user-friendly language for querying.

To illustrate, in an energy company the traditional workflow for geologists to find answers to their information needs is to either execute pre-defined queries covering parts of the needs over their databases and then integrate the results manually, which is onerous and error-prone, or to ask the IT department to construct custom SQL queries, which may takes days or even weeks. OBDM reduces the time for finding answers to minutes by allowing the geologists to formulate their queries in natural-language terms and then run these queries via the OBDM tools over their databases.

Thus, by bringing together knowledge representation and database technologies, OBDM has the potential to transform information systems by allowing domain experts to query complex and distributed data efficiently without the help of database professionals.

This project addresses the main bottleneck in the way to realise this potential: so far, OBDM has been developed primarily for access to purely qualitative and one-dimensional data, but nowadays data is mostly numerical, many-dimensional, often temporal, and user information needs usually involve quantitative analysis. Thus, quantitative queries such as "find all UK-sponsored research institutions in Europe whose total triennial financial contributions from UK-based private companies exceeds euro 10M" are not supported at all by existing OBDM tools. Moreover, because of the so-called open world assumption made in OBDM, developing the theory and practical tools for dealing with such queries is extremely challenging.

The aim of this project is to develop a novel OBDM framework for querying and analysing many-dimensional numerical data. To address the challenges, we bring together techniques from databases, knowledge representation, and formal methods, in particular temporal and modal logics, and develop these further. We will develop a theoretical framework for querying such data, develop tools for using this framework in practice, and test our tools with partners from industry and the public sector.

Planned Impact

The practical aim of this project is to build a novel ontology-based data management (OBDM) tool that will facilitate querying and analysing multi-dimensional numerical data from various types of data sources. The tool will be based on the existing state-of-the-art OBDM platform Ontop (developed at the Free University of Bozen-Bolzano in collaboration with Birkbeck), which at the moment provides access to qualitative data stored in relational databases, but does not support querying and aggregating numerical data (such as, for example, seismic data, timestamped sensor measurements, production data or medical measurements and tests) and integrating different types of data sources, in particular, APIs. The tool will be designed and implemented in close collaboration with a range of project partners from industry (Equinor, Lundin and Siemens), small/medium business (SIRIS Academic) and public sector (Josef Pilsudski Institute of America). Thus, we can realistically expect that our novel technology will be taken up quickly and implemented by the partners, and further by other companies, organisations and institutions where decision making depends on querying and analysing data. In particular, we expect applications of our tool in healthcare, where semantic technologies are already having an impact and where many-dimensional and numerical OBDM could help medical experts in identifying diseases and suitable treatment based on high temporal resolution data including lab results, electronic documentation, and bedside monitor trends.

The impact of our technology and tool will be to allow non-IT-expert end-users to efficiently retrieve complete data from multiple data sources without any assistance from database specialists, which will significantly simplify and speed-up the process of data gathering. Along with the monetary savings and improvement in efficiency, this technology can generate significantly higher value by freeing domain experts' time to focus on their core tasks of data evaluation and analysis rather than spending time on searching and gathering data and information.

Another impact of this project is the promotion of new semantic technologies and standards among various companies and users. The seminars and tutorials that we will be delivered at our partners and conferences will contribute towards this goal.

Publications

10 25 50