PrOQAW: Probabilistic Ontological Query Answering on the Web

Lead Research Organisation: University of Oxford

Department Name: Computer Science

Abstract

The next revolution in Web search as one of the key technologies of the Web has just started with the incorporation of ideas from the Semantic Web, aiming at transforming current Web search into some form of semantic search and query answering on the Web, by adding meaning to Web contents and queries in the form of an underlying ontology. This also allows for more complex queries, and for evaluating queries by combining knowledge that is distributed over many Web pages, i.e., by reasoning over the Web.

Realizing such semantic search and query answering on the Web by adding ontological meaning to the current Web conceptually means annotating Web pages and their contents relative to that ontology, i.e., relating Web pages and their contents to and thus also via that ontology. From a practical perspective, one of the most promising ways of realizing this is to perform data extraction from the current Web relative to the underlying ontology, store the extracted data in a knowledge base, and realize semantic search and query answering on this knowledge base. There are recently many strong research activities in this direction.

A major unsolved problem in the above context is the principled handling of uncertainty: In addition to natural uncertainty as an inherent part of Web data, one also has to deal with uncertainty resulting from automatically processing Web data. The former also includes uncertainty due to incompleteness and inconsistency in the case of missing and over-specified information, respectively. The latter includes uncertainty due to, e.g., the automatic annotation of Web pages and their contents, the automatic extraction of knowledge from the Web, matching between different related ontologies, and the integration of distributed Web data sources.

The central goal of the proposed research is to develop a family of probabilistic data models for knowledge bases extracted from the Web relative to an underlying ontology, along with scalable query answering algorithms, which may serve as the backbone for next-generation technologies for semantic search and query answering on the Web. We believe that such probabilistic data models and query answering algorithms can be developed by integrating ontology languages, database technologies, and formalisms for managing probabilistic uncertainty in the context of the Web. The objectives include developing probabilistic data models, developing algorithms for ranking and query answering, identifying useful scalable fragments, and practically evaluating our results.

Planned Impact

Towards next-generation technologies for semantic search and query answering on the Web, the project's central goal is to develop a family of probabilistic data models, along with scalable query answering algorithms, for knowledge bases that are extracted from the Web relative to an underlying ontology. We believe that these data models and query answering algorithms will exert a major influence on the theory and practice of data extraction from the Web and of semantic search and query answering on the Web. Thus, short-term non-academic beneficiaries will include industrial researchers in these fields. So, unsurprisingly, companies working on the former, such as Lixto, and those working on the latter, such as Yahoo!, Google, and Microsoft, have reported a strong interest in the project (Lixto, Yahoo!, and Google by letters of support, and the director of Microsoft's FUSE (Future Social Experiences) Lab in Cambridge in personal communication with the PI).

In longer terms, our research will also lay the foundation for a new generation of information systems that will allow for dealing with incomplete, uncertain, inconsistent, semi-structured, overlapping, and semantically related data. Sources of such data are spreading at a phenomenal rate, most notably in the context of the Web. Thus, in the mid term, non-academic beneficiaries will include industrial researchers who are trying to develop information systems aimed at dealing with this kind of data, as well as industrial researchers and developers who would like to exploit such data in applications. For example, as for the latter, Alcatel-Lucent have uncertain data from community-curated content and sensor readings, which they would like to exploit (see letters of support).

In the long term, non-academic beneficiaries could include anyone using or depending on the Web or on any other information system, which essentially includes every individual and every business and organisation. In particular, any significant advances in semantic search and query answering on the Web may influence all our lives in a similar, revolutionary way as the invention of the technologies behind Google's Web search. For example, one long-term vision of Web search is the one of an intelligent query answering interface in spoken natural language, similar to a human being; such an interface would make the Web accessible to a much larger class of people than nowadays (e.g., by spoken language via mobile phones) and would offer highly exciting new economic and social opportunities, contributing to UK's health, wealth, and culture. For example, a vertical such interface for a specific domain could offer completely new educational opportunities, reaching a much larger class of people at much lower costs than traditional ones. The proposed research will lay the foundations in this direction.

Our contacts and collaborations with industry (see above) will help us to ensure that our work will also have an immediate impact on and benefits for companies and organisations that develop and use Web information systems. For example, the Web data extraction applications developed by Lixto, and the semantic search initiatives for the Web by Yahoo!, Google, and Microsoft all would directly benefit from the results of this project. As non-academic dissemination and engagement activities, in addition to continuing our collaborations with industry, we will also explore the possible commercialisation of our results (e.g., via these collaborations).

Funded Value:

£813,812

Funded Period:

Apr 12 - Apr 16

Funder:

EPSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

EP/J008346/1

Principal Investigator:

Thomas Lukasiewicz

Research Subject:

Info. & commun. Technol. (75%)

Mathematical sciences (25%)

Research Topic:

Information & Knowledge Mgmt (75%)

Statistics & Appl. Probability (25%)

Organisations

People	ORCID iD
Thomas Lukasiewicz (Principal Investigator)
Dan Olteanu (Co-Investigator)
Georg Gottlob (Co-Investigator)
Michael Benedikt (Co-Investigator)
Gerardo Simari (Researcher)

Publications

Author Name

Title Publication Date Published

|< < 1 2 3 4 5 6 7 8 9 10 > >|

10 25 50

Barany V (2016) Declarative Probabilistic Programming with Datalog

Borgwardt S (2016) Preferential Query Answering over the Semantic Web with Possibilistic Networks

Borgwardt S (2017) Ontology-Mediated Queries for Probabilistic Databases

Borgwardt S. (2017) Ontology-mediated queries for probabilistic databases in 31st AAAI Conference on Artificial Intelligence, AAAI 2017

Bourhis P (2014) Mathematical Foundations of Computer Science 2014 - 39th International Symposium, MFCS 2014, Budapest, Hungary, August 25-29, 2014. Proceedings, Part I

Bourhis P. (2014) Acyclic query answering under guarded disjunctive existential rules and consequences to DLs in CEUR Workshop Proceedings

Callahan D (2013) Reasoning about Complex Networks: A Logic Programming Approach

Ceylan I I (2017) Query Answering in Ontologies Under Preference Rankings

Ceylan I I (2017) Most Probable Explanations for Probabilistic Database Queries

Ceylan I I (2016) Complexity Results for Probabilistic Datalog+/-

Key Findings
Further Funding
Collaboration


Description	As evidenced by the long list of associated publications, this research grant has generated significant new knowledge and new research methods. In particular, we have developed several probabilistic data models for the Web that allow for representing the different types of probabilistic uncertainty that may occur in Web data, either as an inherent part or as the result of an automatic processing of Web data. We have also developed several techniques for ranking answers to probabilistic ontological queries (which refers to the task of deciding how to sort the answers to a query), analysed the computational complexity of ranking, query answering, and top-k query answering (which describes the actual process of returning the k top-ranked answers to a query), and developed algorithms for all these problems. We have also identified useful fragments of the probabilistic data models for the Web, where ranking, query answering, and top-k query answering are tractable in the data complexity, and designed scalable practical algorithms for these tasks in the identified fragments. Furthermore, we have implemented a prototype for representing, ranking, querying, and top-k querying Web data, optimised its algorithms and implementation, and performed empirical evaluations that back the theoretical results, specifically using real-world data and applications. In particular, these results were published in the journals ACM Transactions on Internet Technology, Annals of Mathematics and Artificial Intelligence (3 papers), Journal on Data Semantics, Theory and Practice of Logic Programming, The VLDB Journal, IEEE Data Engineering Bulletin, Proceedings of the VLDB Endowment, as well as in the proceedings of the top-tier artificial intelligence conferences IJCAI (3 papers), AAAI (4 papers), ECAI (4 papers), KR (4 papers), and UAI, and database conferences ICDE, PODS, and EDBT. As for awards and recognition, Thomas Lukasiewicz, Maria Vanina Martinez, Livia Predoiu, and Gerardo I. Simari received the RuleML 2015 Best Paper Award for their contribution "Existential Rules and Bayesian Networks for Probabilistic Ontological Data Exchange", and the senior research assistant employed on the grant, Gerardo I. Simari, was on IEEE Intelligent System's prestigious "AI's 10 to Watch" list for 2016. Furthermore, Thomas Lukasiewicz started an area editorship for the journal ACM Transactions on Computational Logic and associate editorships for the journals Artificial Intelligence and Journal of Artificial Intelligence Research; he also gave (or will give) invited presentations at BELIEF 2014, ICLP 205, ISMIS 2015, KR 2016, and Reasoning Web 2017. The research funding has also enabled many new collaborations, namely with researchers at Oxford Brookes University, the University of Liverpool, the University of Glasgow, the University of British Columbia, Dresden University of Technology, the Free University of Bozen-Bolzano, the Polytechnic University of Bari, the University of Calabria, and Beijing University of Posts and Telecommunications. Highlights among closely related further funding are a Google European Doctoral Fellowship for Oana Tifrea-Marciuska, two EU Marie-Curie Individual Fellowships, a Leverhulme Trust Visiting Professorship for David Poole, and a seed funding project at the Alan Turing Institute.
Exploitation Route	Beneficiaries of the research outcomes include researchers in both academia and industry in the fields of Web data extraction and of semantic search and query answering on the Web. Our research has also laid the foundation for a new generation of information systems that will allow for dealing with incomplete, uncertain, inconsistent, semi-structured, overlapping, and semantically related data. Sources of such data are spreading at a phenomenal rate, most notably in the context of the Web, but also in the form of Big Data in general. Thus, beneficiaries include researchers in both academia and industry who are trying to develop information systems aimed at dealing with this kind of data, as well as researchers and developers who would like to exploit such data in applications. In the long term, beneficiaries could include anyone using or depending on the Web or on any other information system, which essentially includes every individual and every business and organisation. For example, one long-term vision of Web search is the one of an intelligent query answering interface in spoken natural language, similar to a human being; such an interface would make the Web accessible to a much larger class of people than nowadays (e.g., by spoken language via mobile phones) and would offer highly exciting new economic and social opportunities; the proposed research laid the foundations in this direction.
Sectors	Communities and Social Services/Policy Creative Economy Digital/Communication/Information Technologies (including Software) Education Environment Financial Services and Management Consultancy Healthcare Leisure Activities including Sports Recreation and Tourism Government Democracy and Justice Culture Heritage Museums and Collections Retail Security and Diplomacy
URL	http://www.cs.ox.ac.uk/projects/PrOQAW/index.html


Description	Google European Doctoral Fellowship
Amount	$180,000 (USD)
Organisation	Google
Sector	Private
Country	United States
Start	04/2013
End	09/2016


Description	Intelligent Question Answering, seed funding project
Amount	£23,500 (GBP)
Organisation	Alan Turing Institute
Sector	Academic/University
Country	United Kingdom
Start	12/2016
End	07/2017


Description	Leverhulme Trust Visiting Professorship
Amount	£20,500 (GBP)
Organisation	The Leverhulme Trust
Sector	Charity/Non Profit
Country	United Kingdom
Start	08/2014
End	06/2015


Description	Marie-Curie Individual Fellowship
Amount	€ 200,372 (EUR)
Organisation	European Commission
Sector	Public
Country	Belgium
Start	03/2013
End	02/2015


Description	Marie-Curie Individual Fellowship
Amount	€ 168,167 (EUR)
Organisation	European Commission
Sector	Public
Country	Belgium
Start	03/2016
End	12/2017


Description	Platform Grant
Amount	£1,263,746 (GBP)
Funding ID	EP/L012138/1
Organisation	Engineering and Physical Sciences Research Council (EPSRC)
Sector	Public
Country	United Kingdom
Start	01/2014
End	01/2019


Description	Programme Grant
Amount	£4,557,635 (GBP)
Funding ID	EP/M025268/1
Organisation	Engineering and Physical Sciences Research Council (EPSRC)
Sector	Public
Country	United Kingdom
Start	03/2015
End	03/2020


Description	Collaboration with Beijing University of Posts and Telecommunications (BUPT)
Organisation	Beijing University of Posts and Telecommunications
Country	China
Sector	Academic/University
PI Contribution	Expertise, intellectual input, and training of staff.
Collaborator Contribution	Expertise and intellectual input.
Impact	Several joint publications with Cheng Chen, Xiangwu Meng, and Yishu Miao (see list of publications).
Start Year	2015


Description	Collaboration with Free University of Bozen-Bolzano
Organisation	Free University of Bozen-Bolzano
Country	Italy
Sector	Academic/University
PI Contribution	Expertise, intellectual input, and training of staff.
Collaborator Contribution	Expertise and intellectual input.
Impact	Several joint publications with Rafael Penaloza (see list of publications).
Start Year	2015


Description	Collaboration with Oxford Brookes University
Organisation	Oxford Brookes University
Country	United Kingdom
Sector	Academic/University
PI Contribution	Expertise and intellectual input.
Collaborator Contribution	Expertise and intellectual input.
Impact	Joint grant application with Fabio Cuzzolin.
Start Year	2014


Description	Collaboration with Polytechnic University of Bari
Organisation	Polytechnic University of Bari
Country	Italy
Sector	Academic/University
PI Contribution	Expertise, intellectual input, and the training of staff.
Collaborator Contribution	Expertise and intellectual input.
Impact	Several joint publications with Tommaso Di Noia (see list of publications).
Start Year	2012


Description	Collaboration with TU Dresden
Organisation	Technical University of Dresden
Department	Institute of Theoretical Computer Science
Country	Germany
Sector	Academic/University
PI Contribution	Expertise, intellectual input, and training of staff.
Collaborator Contribution	Expertise and intellectual input.
Impact	Several joint publications with Stefan Borgwardt and Ismail Ilkan Ceylan (see list of publications).
Start Year	2015


Description	Collaboration with University of British Columbia
Organisation	University of British Columbia
Country	Canada
Sector	Academic/University
PI Contribution	Expertise and intellectual input.
Collaborator Contribution	Expertise and intellectual input.
Impact	Several joint publications with David Poole (see list of publications).
Start Year	2014


Description	Collaboration with University of Calabria
Organisation	University of Calabria
Country	Italy
Sector	Academic/University
PI Contribution	Expertise, intellectual input, and training of staff.
Collaborator Contribution	Expertise and intellectual input.
Impact	Several joint publications with Cristian Molinaro (see list of publications).
Start Year	2013


Description	Collaboration with University of Glasgow
Organisation	University of Glasgow
Country	United Kingdom
Sector	Academic/University
PI Contribution	Expertise and intellectual input.
Collaborator Contribution	Expertise and intellectual input.
Impact	Several joint publications with Clemens Kupke (see list of publications).
Start Year	2012


Description	Collaboration with University of Liverpool
Organisation	University of Liverpool
Country	United Kingdom
Sector	Academic/University
PI Contribution	Expertise and intellectual input.
Collaborator Contribution	Expertise and intellectual input.
Impact	Several joint publications with Andre Hernich (see list of publications).
Start Year	2012

Abstract

Planned Impact

Organisations

People

ORCID iD

Publications