MaSI3: A Massively Scalable Intelligent Information Infrastructure

Lead Research Organisation: University of Oxford

Department Name: Computer Science

Abstract

Ontology-based Data Management Systems (ODMSs) are a new kind of data management systems specifically designed to manage large semi-structured data sets needed to power modern intelligent applications. Most ODMSs are based on the Resource Description Framework (RDF) data model, which was specifically designed for the representation of semi-structured data. RDF data sets consist of triples, and RDF data sets are often seen as graphs with labelled vertices and edges. The structure of RDF data is described using an ontology - a set of logical axioms that give semantics to the graph, and enable the derivation of new triples via reasoning. The ontology is often expressed in the Web Ontology Language (OWL), sometimes extended with the Semantic Web Rule Language (SWRL). The main task of an ODMS is to answer queries over the given ontology and data set, with the queries commonly being expressed in the SPARQL language. Reasoning plays a key role in query answering, and modern intelligent applications commonly require an integration of taxonomic, spatio-temporal, mereological, and other kinds of reasoning.

ODMSs can and do exploit implementation techniques described in the database literature. The computational problems that such systems need to solve, however, are very hard, so developing robustly scalable systems is extremely challenging, usually requiring a combination heuristics and careful engineering. Although significant progress has been made and state of the art ODMSs can now deal with nontrivial data sets, their performance still falls far short of what is required by modern `data hungry' applications. This is partly due to the sheer size of the data sets that need to be processed, but also partly due to the complexity of the reasoning tasks that need to be performed.

Critical to the performance of ODMSs is the fact that the units of data that they store (i.e., triples) are very small so, to retrieve useful information, typical queries tend to be quite large. Efficiently answering such queries requires exhaustive data indexing; however, building and maintaining these indices can itself compromise scalability, particularly during update-intensive tasks such as materialisation-based reasoning. Moreover, although query evaluation is subpolynomial in data size, it is NP-hard in query size, so techniques that are effective on small queries may fail on large and complex queries. Finally, scaling ODMSs to deal with Big Data will inevitably require distributed data storage and query processing, but existing data partitioning schemes are unlikely to fully exploit the potential for parallelisation and minimise distributed processing on large queries.

Due to these issues, be believe that the robust scalability required by modern ODMS applications can only be achieved through the principled application of techniques that provide provable performance and/or tractability guarantees. The use of such techniques will not only allow for better and more consistent performance, but will also help ODMS users to better understand and thus avoid performance bottlenecks. We plan to develop the relevant techniques by synthesising and extending the results from three distinct research fields: databases, knowledge representation, and mathematical network theory. Combining these techniques with insightful engineering and extensive optimisation will, we believe, allow us to implement a new ODMS with
scalability surpassing that of existing systems by several orders of magnitude. Finally, we will exploit our contacts with industry (see enclosed Letters of Support) to evaluate and tune our ODMS in real-world settings. We will thus lay both the theoretical and the practical foundations for a massively scalable intelligent information infrastructure capable of powering modern data-intensive applications.

Planned Impact

* Academic Impact

We believe that the techniques developed in this project will exert a major influence on the theory and practice of data management and reasoning in several academic communities. As explained in the Case for Support and in line with EPSRC's `Working Together' priority, addressing the technical challenges inherent in intelligent management of large volumes of data requires a collaboration with researchers within and outside ICT. Within ICT, we expect a strong cross-fertilisation of ideas between the knowledge representation and reasoning community on the one side, and the database community on the other side. Outside ICT, solving the problems related to data partitioning will require a collaboration with researchers in the mathematical network theory. Through these collaborations, this project has the potential to shape the research agenda in knowledge representation and reasoning, databases, and mathematics, contributing new ideas and uncovering challenges for future work. This will contribute to expanding the UK's research base, and to a consolidation of the UK's established world leadership in the mentioned research areas.

* Commercial Impact

Various companies have already recognised ODMSs as a great commercial opportunity. For example, numerous start-ups and small companies in the UK, the EU, and the USA (such as Garlik, ExperienceOn, ontoprise, OpenLink, Clark&Parsia, OntoText, Metaweb, and fluidOps) are currently developing ODMS variants. Well-known providers of data management infrastructure have also recognised the need to support RDF and OWL; for example, Oracle has recently enhanced its well-known database management system with modules that use ontologies to support `semantic data management'. Although companies such as Oracle see a big market in the application of ontology-based technologies, their existing systems suffer from numerous limitations. Thus, addressing the scalability problems outlined in this proposal would have a significant impact on the business of these companies.

* Dissemination and Engagement

We will undertake a range of activities in order to ensure the widest possible dissemination of our results and engagement with anticipated beneficiaries.

First, we will continue our established pattern of publishing our research in international journals and conferences. Our publications have appeared in top journals such as JACM, AI, JAIR, JWS, VLDB Journal, and Information & Computation, as well as leading conferences such as IJCAI, KR, the ISWC, and IJCAR.

Second, we will continue our participation in relevant international coordination and standardisation efforts within groups and organisations such as the World Wide Web consortium and the OWL Experiences and Directions Group. Through these activities we can foster awareness of our work and ensure that it has the maximum possible impact on any future standards. For example, the W3C's OWL 2 ontology language standard is based on our work on description logics.

Third, we will continue our collaboration with the developers of ontology-based systems and applications in both academia and industry, including, for example, BAE Systems, ExperienceOn, and Samsung (see Letters of Support). As well as providing a channel for dissemination, our industry contacts will provide excellent opportunities for commercialising the results of this project.

Fourth, we will make all project outputs available from the project web site, including papers, presentations, tutorials, and software.

Funded Value:

£817,862

Funded Period:

Jan 13 - Dec 17

Funder:

EPSRC

Project Status:

Closed

Project Category:

Fellowship

Project Reference:

EP/K00607X/1

Principal Investigator:

Boris Motik

Research Subject:

Info. & commun. Technol. (100%)

Research Topic:

Information & Knowledge Mgmt (100%)

Organisations

People	ORCID iD
Boris Motik (Principal Investigator / Fellow)	http://orcid.org/0000-0003-2506-4118

Publications

Author Name

Title Publication Date Published

|< < 1 2 3 4 5 6 > >|

10 25 50

Andrew B (2016) Extending Consequence-Based Reasoning to SRIQ

Bate A (2018) Consequence-Based Reasoning for Description Logics with Disjunctions and Number Restrictions in Journal of Artificial Intelligence Research

Benedikt M (2017) Benchmarking the Chase

Benedikt M (2018) Goal-Driven Query Answering for Existential Rules with Equality

Chaussecourte P (2013) The Energy Management Adviser at EDF in Proc. of the 12th Int. Semantic Web Conf. (ISWC 2013), In-Use Track

Chen J (2019) Learning Semantic Annotations for Tabular Data

Chen J (2019) ColNet: Embedding the Semantics of Web Tables for Column Type Prediction in Proceedings of the AAAI Conference on Artificial Intelligence

Cima G (2019) The Semantic Web - ISWC 2019 - 18th International Semantic Web Conference, Auckland, New Zealand, October 26-30, 2019, Proceedings, Part I

Cuenca Grau B (2013) Acyclicity Notions for Existential Rules and Their Application to Query Answering in Ontologies in Journal of Artificial Intelligence Research

Cuenca Grau B (2013) Computing Datalog Rewritings Beyond Horn Ontologies in Proc. of the 23rd Int. Joint Conf. on Artificial Intelligence (IJCAI 2013)

Key Findings
Impact Summary
Further Funding
Collaboration
Intellectual Property
Software and Technical Products
Spin Outs


Description	We developed several novel algorithms for the management of RDF data. These include algorithms for computing the materialisation of datalog programs with and without equality in main-memory RDF stores, and algorithms for the incremental maintenance of such materialisations (i.e., of computing how to update the materialisation if only a small fraction of the input changes). We have implemented these techniques in our RDFox data management system and evaluated them against the state of the art. We showed that our techniques considerably outperform related techniques known in the literature. In particular, on an high-end Oracle server we obtained inference rates unparalleled in the literature. In addition, we have developed a novel technique for answering SPARQL queries in distributed RDF systems. The technique is quite different from what is commonly found in federated database systems. We are still evaluating our technique, but the results of our initial performance comparison are very encouraging.
Exploitation Route	Our techniques can be used by all RDF management systems that employ materialisation as a reasoning technique. Furthermore, we are working with Oracle on exploring ways of incorporating these techniques in their systems.
Sectors	Digital/Communication/Information Technologies (including Software)


Description	We have submitted a patent application describing the design of some of the key components of our system. We have also published a number of papers describing various forms of new reasoning techniques. I have been collaborating with various companies. As a prominent example of such collaboration, Anthony Potter, a PhD student of mine working on distributed querying techniques closely related with this project, visited Oracle Corporation in 2015 on a four-month internship. During his stay in California, he got Oracle interested to the extent that they implemented Anthony's algorithm in their graph database and are currently evaluating the extent to which they will include the algorithm into their product. Finally, I started two spinout companies: Covatic and Oxford Semantic Technologies. The specific aim of the latter company is to bring RDFox (the system developed in this project) to market. The company has raised considerable investment and is currently employing three people full time.
First Year Of Impact	2015
Sector	Digital/Communication/Information Technologies (including Software)
Impact Types	Economic


Description	KE Seed Fund Grant
Amount	£3,000 (GBP)
Organisation	University of Oxford
Sector	Academic/University
Country	United Kingdom
Start	01/2016
End	01/2016


Description	Oracle External Research Office grant
Amount	$95,000 (USD)
Organisation	Oracle Corporation
Sector	Private
Country	United States
Start	03/2016
End	03/2017


Description	University of Oxford / Impact Acceleration Award
Amount	£30,269 (GBP)
Organisation	Engineering and Physical Sciences Research Council (EPSRC)
Sector	Public
Country	United Kingdom
Start	03/2015
End	09/2015


Description	University of Oxford / Impact Acceleration Award
Amount	£53,786 (GBP)
Organisation	Engineering and Physical Sciences Research Council (EPSRC)
Sector	Public
Country	United Kingdom
Start	01/2016
End	12/2016


Description	Armasuisse collaboration
Organisation	Federal Office for Defence Procurement Armasuisse
Country	Switzerland
Sector	Public
PI Contribution	We collaborated with Armasuisse on applying semantic technologies to the problem of detecting events on Twitter. The collaboration resulted in a paper that will be published at ESWC 2017. Apart from Armasuisse, the University of Fribourg also collaborated on the project as well; however, Armasuisse was the main project partner.
Collaborator Contribution	Armasuisse provided the use case, the data for the evaluation, and the expertise in analysing Twitter time series data. Their contribution was crucial to getting the ESWC 2017 paper into shape.
Impact	ESWC 2017 paper called "ArmaTweet: Detecting Events by Semantic Tweet Analysis". The paper is yet to be published, so the bibliographic details are not yet complete.
Start Year	2016


Description	Oracle
Organisation	Oracle Corporation
Department	Oracle Corporation UK Ltd
Country	United Kingdom
Sector	Private
PI Contribution	Anthony Potter, a PhD student in the department, is working on distributed query answering algorithms. In 2015 he visited Oracle on a four-month internship. During the internship, Oracle has decided to implement Anthony's algorithm in their graph database. They also decided to support further research on semantic technologies through their External Researcher Programme.
Collaborator Contribution	Oracle are supporting the research in semantic technologies with an unrestricted grant of $95k/year.
Impact	Oracle implemented the distributed query answering algorithm in their system and is planning to use it in practice.
Start Year	2014


Title	Parallel materialisation of a set of logical rules on a logical database
Description	This invention concerns the materialisation of a set of logical rules on a logical database, such as a Resource Description Framework (RDF) database. More particularly, but not exclusively, the invention concerns computer-implemented methods of providing the materialisation of a set of logical rules on a logical database that are particularly amenable to parallel execution. The invention also concerns methods of storing data in computer memory when executing such methods.
IP Reference	GB1319252.1
Protection	Patent application published
Year Protection Granted	2014
Licensed	No
Impact	The technology described in this patent provides the foundation for RDFox -- a software system (listed as output of the MaSI3 grant) for scalable management of RDF data. The University and the PI recently started two spinout companies -- Covatic and Oxford Semantic Technologies -- whose goal is to further develop RDFox and use it in a commercial setting. Both companies are listed as outputs of the MaSI3 fellowship.


Title	RDFox
Description	Triple store / graph DB
Type Of Technology	Software
Year Produced	2016
Impact	Basis for Covatic and OST spin-outs
URL	https://www.cs.ox.ac.uk/isg/tools/RDFox/


Company Name	Covatic
Description	Covatic develops software that analyses a user's online engagement to deliver personalised advertising.
Year Established	2016
Impact	The company is just starting in February 2017, so there are no major impacts yet. However, the company has a partnership with ITN that will guide the development of the products.
Website	http://covatic.com


Company Name	Oxford Semantic Technologies
Description	Oxford Semantic Technologies develops software that uses machine learning to analyse semantic data and its ontologies, which can be used when combining or ordering multiple datasets, and in simulating predictive relationships between data.
Year Established	2016
Impact	The company has just started so it does not have major impacts yet.
Website	http://www.oxfordsemantic.tech

Abstract

Planned Impact

Organisations

People

ORCID iD

Publications