OASIS: Ontology Reasoning over Frequently-changing and Streaming Data

Lead Research Organisation: University of Oxford
Department Name: Computer Science

Abstract

Advanced applications relying on intelligent management of loosely structured and large-scale datasets play a key role in domains such as healthcare, business and government.
Ontology-based technologies lie at the core of many such applications. In a nutshell, an ontology-based data management system (ODMS) enables intelligent information processing by providing means for representing background knowledge about the application in an ontology, and exploiting automated reasoning techniques to infer information that is implicit in the data and the ontology.

State-of-the art ODMSs are, however, not well-suited for applications which require real-time analysis of rapidly changing data. For instance, oil and gas companies continuously monitor sensor readings to detect equipment malfunction and predict maintenance needs; network providers analyse flow data to identify traffic anomalies and Denial of Service attacks; knowledge graphs are continuously updated; and Internet of Things (IoT) applications such as Smart Cities require real-time analysis of data stemming from multiple types of device.

ODMSs often borrow implementation techniques from the database literature, where real-time analysis of rapidly changing data has been tackled using two main approaches.

(1) In a stream processing system, the input data is conceptually seen as an unbounded sequence of time-stamped tuples that flow through the system; data is only available for processing in a single pass and information stored by the system is inherently incomplete. Streaming jobs are long-running: queries are deployed once and continue to produce results until removed.State-of-the art systems, such as Apache Storm, Apache Spark Streaming, Google's Millwheel, Linked In's Samza, and Apache Flink, achieve sub-second latencies by distributing the streaming workload in a cluster, which requires sophisticated scheduling and fault-tolerance techniques.

(2) In a real-time database, the data is seen as a finite collection of records that is continuously evolving. This traditional concept of a finite and persistent collection is ubiquitous in the database world is well-suited for applications requiring a consistent and complete view of the data.The key feature that distinguishes real-time from traditional databases is that, similarly to streaming systems, they allow clients to subscribe to long-running continuous queries that instantaneously push incremental updates.

Many theoretical and practical difficulties arise, however, when adapting these approaches to ODMSs. In the OASIS project, we will address these difficulties and lay the foundations for a new generation of ODMSs capable of ingesting and processing rapidly changing data in real time. Such systems will support the aforementioned applications by enabling fast execution of complex analytics pipelines supporting intelligent decisions. Moreover, we will exploit the resulting insights to implement a prototype and test it in real-life deployments.

Planned Impact

The main non-academic beneficiaries of this work are the growing sector of the information technology industry for which ontologies and related technologies, such as knowledge graphs, represent an important component of their products and/or services. In the case for support we have described the difficulties that companies (such as our industrial partner OSIsoft) face when analysing streaming data in the presence of an ontology. Similar challenges can be found in domains ranging from government and healthcare to the aerospace and finance industries, and it is our belief that this project has the potential to have wide impact in all these sectors of the economy. For example, we saw similar problems in our collaboration with Siemens, who attempted to apply the ontology-based technologies to integrate and analyse streaming data generated by sensors fitted in wind turbines, as well as in our collaboration with Armasuisse, who deployed a social media analysis system to help detect natural disasters and terrorist activity by processing Twitter posts in real time.

The needs of industries in the aforementioned domains has created a great interest within the information technology industry in developing more flexible information management layers. Oracle, one of the leading data management solution providers with whom we have an established collaboration, has taken some initial steps in this direction, enhancing its well-known database management system with modules that use ontologies to support `semantic data management'. We anticipate similar interest from other vendors.

A wide range of start-up companies are developing ontology-based data management platforms from the ground up. These include Stardog, OntoText, and our own spinout company Oxford Semantic Technologies (https://www.oxfordsemantic.tech/). Our project is clearly a perfect fit for their business.

ENGAGEMENT AND DISSEMINATION

Engagement with non-academic beneficiaries is an integral part of our project, with industry partners making a significant contribution, including working with us to deploy and evaluate prototypical implementations. As well as ensuring the relevance of our research, this engagement will provide a direct pathway to impact via dissemination and possible exploitation.

We exploit our wider network of non-academic collaborators, including the partners in our DBOnto platform (http://dbonto.cs.ox.ac.uk). We will additionally undertake a range of more broadly focused dissemination activities. Firstly, we will showcase the achievements of the project to industry and research leaders via dedicated workshops. Secondly, we will continue our established pattern of publication in leading conferences and journals; in order to maximise the impact on non-academic partners, we will target in-use and industry tracks at conferences wherever possible coauthoring papers with industry partners. Thirdly, we will continue to participate in international coordination and standardisation efforts within organisations such as the World Wide Web consortium (W3C). Finally, we will continue to make research outputs freely available from our web site.

EXPLOITATION

We have recently founded two spinout companies that are commercialising the results of our research. The results of OASIS will be relevant to them, thus providing an ideal exploitation pathway. As a result, the UK economy benefits through creation of skilled jobs; the University of Oxford benefits through license fee revenue, and it also holds a stake in the companies. Exploitation of IP resulting from the project will be managed by OUI, a wholly-owned subsidiary of Oxford University, founded to exploit IP arising out of research activities.

TRACK RECORD

Our research has already been highly influential outside academia, and has been the basis for international standards, widely used and/or commercialised software and spinout companies.

Publications

10 25 50
publication icon
Walega P (2020) Subject-oriented spatial logic in Information and Computation