OASIS: Ontology Reasoning over Frequently-changing and Streaming Data

Lead Research Organisation: University of Oxford
Department Name: Computer Science

Abstract

Advanced applications relying on intelligent management of loosely structured and large-scale datasets play a key role in domains such as healthcare, business and government.
Ontology-based technologies lie at the core of many such applications. In a nutshell, an ontology-based data management system (ODMS) enables intelligent information processing by providing means for representing background knowledge about the application in an ontology, and exploiting automated reasoning techniques to infer information that is implicit in the data and the ontology.

State-of-the art ODMSs are, however, not well-suited for applications which require real-time analysis of rapidly changing data. For instance, oil and gas companies continuously monitor sensor readings to detect equipment malfunction and predict maintenance needs; network providers analyse flow data to identify traffic anomalies and Denial of Service attacks; knowledge graphs are continuously updated; and Internet of Things (IoT) applications such as Smart Cities require real-time analysis of data stemming from multiple types of device.

ODMSs often borrow implementation techniques from the database literature, where real-time analysis of rapidly changing data has been tackled using two main approaches.

(1) In a stream processing system, the input data is conceptually seen as an unbounded sequence of time-stamped tuples that flow through the system; data is only available for processing in a single pass and information stored by the system is inherently incomplete. Streaming jobs are long-running: queries are deployed once and continue to produce results until removed.State-of-the art systems, such as Apache Storm, Apache Spark Streaming, Google's Millwheel, Linked In's Samza, and Apache Flink, achieve sub-second latencies by distributing the streaming workload in a cluster, which requires sophisticated scheduling and fault-tolerance techniques.

(2) In a real-time database, the data is seen as a finite collection of records that is continuously evolving. This traditional concept of a finite and persistent collection is ubiquitous in the database world is well-suited for applications requiring a consistent and complete view of the data.The key feature that distinguishes real-time from traditional databases is that, similarly to streaming systems, they allow clients to subscribe to long-running continuous queries that instantaneously push incremental updates.

Many theoretical and practical difficulties arise, however, when adapting these approaches to ODMSs. In the OASIS project, we will address these difficulties and lay the foundations for a new generation of ODMSs capable of ingesting and processing rapidly changing data in real time. Such systems will support the aforementioned applications by enabling fast execution of complex analytics pipelines supporting intelligent decisions. Moreover, we will exploit the resulting insights to implement a prototype and test it in real-life deployments.

Planned Impact

The main non-academic beneficiaries of this work are the growing sector of the information technology industry for which ontologies and related technologies, such as knowledge graphs, represent an important component of their products and/or services. In the case for support we have described the difficulties that companies (such as our industrial partner OSIsoft) face when analysing streaming data in the presence of an ontology. Similar challenges can be found in domains ranging from government and healthcare to the aerospace and finance industries, and it is our belief that this project has the potential to have wide impact in all these sectors of the economy. For example, we saw similar problems in our collaboration with Siemens, who attempted to apply the ontology-based technologies to integrate and analyse streaming data generated by sensors fitted in wind turbines, as well as in our collaboration with Armasuisse, who deployed a social media analysis system to help detect natural disasters and terrorist activity by processing Twitter posts in real time.

The needs of industries in the aforementioned domains has created a great interest within the information technology industry in developing more flexible information management layers. Oracle, one of the leading data management solution providers with whom we have an established collaboration, has taken some initial steps in this direction, enhancing its well-known database management system with modules that use ontologies to support `semantic data management'. We anticipate similar interest from other vendors.

A wide range of start-up companies are developing ontology-based data management platforms from the ground up. These include Stardog, OntoText, and our own spinout company Oxford Semantic Technologies (https://www.oxfordsemantic.tech/). Our project is clearly a perfect fit for their business.

ENGAGEMENT AND DISSEMINATION

Engagement with non-academic beneficiaries is an integral part of our project, with industry partners making a significant contribution, including working with us to deploy and evaluate prototypical implementations. As well as ensuring the relevance of our research, this engagement will provide a direct pathway to impact via dissemination and possible exploitation.

We exploit our wider network of non-academic collaborators, including the partners in our DBOnto platform (http://dbonto.cs.ox.ac.uk). We will additionally undertake a range of more broadly focused dissemination activities. Firstly, we will showcase the achievements of the project to industry and research leaders via dedicated workshops. Secondly, we will continue our established pattern of publication in leading conferences and journals; in order to maximise the impact on non-academic partners, we will target in-use and industry tracks at conferences wherever possible coauthoring papers with industry partners. Thirdly, we will continue to participate in international coordination and standardisation efforts within organisations such as the World Wide Web consortium (W3C). Finally, we will continue to make research outputs freely available from our web site.

EXPLOITATION

We have recently founded two spinout companies that are commercialising the results of our research. The results of OASIS will be relevant to them, thus providing an ideal exploitation pathway. As a result, the UK economy benefits through creation of skilled jobs; the University of Oxford benefits through license fee revenue, and it also holds a stake in the companies. Exploitation of IP resulting from the project will be managed by OUI, a wholly-owned subsidiary of Oxford University, founded to exploit IP arising out of research activities.

TRACK RECORD

Our research has already been highly influential outside academia, and has been the basis for international standards, widely used and/or commercialised software and spinout companies.

Publications

10 25 50

publication icon
Indrzejczak A (2023) When iota meets lambda in Synthese

publication icon
Kaminski M (2021) The Complexity and Expressive Power of Limit Datalog in Journal of the ACM

publication icon
Kaminski M (2020) Complexity and Expressive Power of Disjunction and Negation in Limit Datalog in Proceedings of the AAAI Conference on Artificial Intelligence

publication icon
Proceedings Of The 4th International Workshop On The Resurgence Of Datalog In Academia And Industry (Datalog-2.0 2022) Co-Located With The 16th International Conference On Logic Programming And Nonmonotonic Reasoning (LPNMR 2022) (2022) Reasoning Techniques in DatalogMTL

publication icon
Ronca A (2022) The delay and window size problems in rule-based stream reasoning in Artificial Intelligence

 
Description We have developed novel techniques for efficient reasoning on large-scale temporal datasets and knowledge graphs. Temporal reasoning in the context of knowledge graphs is the ability to reason about and understand the ordering, duration, and causal relationships of events described in the graph; reasoning tasks in the presence of temporal rule sets and large-scale knowledge graphs are extremely challenging and, to the best of our knowledge, ours are the first temporal reasoning techniques designed to cope with the demands of modern data-intensive applications. Furthermore, we have also tackled the challenges surrounding temporal reasoning over streaming data and developed novel reasoning techniques that are both scalable and provably correct.

Our proposed algorithms have been documented in the most prestigious journals and conferences in the area of Artificial Intelligence, Logic Programming, and Semantic Technologies, including the Artificial Intelligence Journal (AIJ), the Journal of Artificial Intelligence Research (JAIR), the International Joint Conference on Artificial Intelligence (IJCAI), the AAAI Conference on Artificial Intelligence, the International Conference on the Principles of Knowledge Representation and Reasoning, and the Journal of Web Semantics. Furthermore, they have been implemented in a software system, called MeTeoR, that has been made available to the research community.
Exploitation Route Our findings open the door to a new generation of software tools for processing large scale datasets containing temporal information. Although our techniques have not yet led to an impact in industry, they have already influenced academic research, attracted attention and inspired further research. For instance, we were invited to give a tutorial on our work at the International Conference on the Principles of Knowledge Representation and Reasoning as well as a plenary invited talk at the International Workshop on Stream Reasoning. Furthermore, our results have inspired follow up research by academics at some of the most prestigious universities in Europe, including TU Vienna (Austria), Dresden (Germany), ETH Zurich (Switzerland), Amsterdam (the Netherlands), and Birbeck College London (UK).
Sectors Digital/Communication/Information Technologies (including Software)

 
Description Collaboration with Birbeck College 
Organisation Birkbeck, University of London
Country United Kingdom 
Sector Academic/University 
PI Contribution Collaboration on Temporal Reasoning in the context of ontologies. Results were disseminated at an international summer school with formal proceedings, where members of both groups produced a joint publication.
Collaborator Contribution See above.
Impact Vladislav Ryzhikov, Przemyslaw Andrzej Walega, Michael Zakharyaschev: Temporal Ontology-Mediated Queries and First-Order Rewritability: A Short Course. Reasoning Web 2020: 109-148
Start Year 2019
 
Description Collaboration with Shanghai Jiaotong University 
Organisation Shanghai Jiao Tong University
Country China 
Sector Academic/University 
PI Contribution We have worked together with researchers at Shanghai Jiaotong University to develop a novel temporal rule-based reasoner. Our joint findings have been documented in a publication at AAAI-2022 (see publication list) and an extended version is under consideration at an International Journal.
Collaborator Contribution Our partners contributed to the design of the key datastructures in the system and played a key role in the evaluation against existing benchmarks.
Impact - A conference publication at AAAI-2022 - A novel temporal rule-based reasoner called MeTeoR.
Start Year 2021
 
Description University of Oslo Collaboration on Temporal Reasoning 
Organisation University of Oslo
Country Norway 
Sector Academic/University 
PI Contribution We have collaborated on a number of recent publications on temporal reasoning.
Collaborator Contribution See above.
Impact - Stratified Negation in Datalog with Metric Temporal Operators. David Tena Cucala, Przemyslaw Walega, Bernardo Cuenca Grau and Egor V.\ Kostylev. Proc.\ of the 25th AAAI Conference on Artificial Intelligence (AAAI 2021). Held virtually, Feb. 2021.
Start Year 2020
 
Title MeTeoR. Scalable System for Reasoning in Datalog with Metric Temporal Operators 
Description MeTeoR is a highly scalable reasoning system for temporal datasets and rule sets involving metric temporal operators 
Type Of Technology Software 
Year Produced 2021 
Open Source License? Yes  
Impact The system is a first implementation of its kind and has been made available to the research community only very recently. Our team won the Hackathon Challenge at the International Workshop of Stream Reasoning in 2021 using MeTeoR. 
URL https://github.com/wdimmy/MeTeoR
 
Description Tutorial on temporal reasoning at the 2022 KR conference. 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Tutorial accepted and the 2022 International Conference on the Principles of Knowledge Representation and Reasoning.
Year(s) Of Engagement Activity 2022