PrOQAW: Probabilistic Ontological Query Answering on the Web

Lead Research Organisation: University of Oxford
Department Name: Computer Science

Abstract

The next revolution in Web search as one of the key technologies of the Web has just started with the incorporation of ideas from the Semantic Web, aiming at transforming current Web search into some form of semantic search and query answering on the Web, by adding meaning to Web contents and queries in the form of an underlying ontology. This also allows for more complex queries, and for evaluating queries by combining knowledge that is distributed over many Web pages, i.e., by reasoning over the Web.

Realizing such semantic search and query answering on the Web by adding ontological meaning to the current Web conceptually means annotating Web pages and their contents relative to that ontology, i.e., relating Web pages and their contents to and thus also via that ontology. From a practical perspective, one of the most promising ways of realizing this is to perform data extraction from the current Web relative to the underlying ontology, store the extracted data in a knowledge base, and realize semantic search and query answering on this knowledge base. There are recently many strong research activities in this direction.

A major unsolved problem in the above context is the principled handling of uncertainty: In addition to natural uncertainty as an inherent part of Web data, one also has to deal with uncertainty resulting from automatically processing Web data. The former also includes uncertainty due to incompleteness and inconsistency in the case of missing and over-specified information, respectively. The latter includes uncertainty due to, e.g., the automatic annotation of Web pages and their contents, the automatic extraction of knowledge from the Web, matching between different related ontologies, and the integration of distributed Web data sources.

The central goal of the proposed research is to develop a family of probabilistic data models for knowledge bases extracted from the Web relative to an underlying ontology, along with scalable query answering algorithms, which may serve as the backbone for next-generation technologies for semantic search and query answering on the Web. We believe that such probabilistic data models and query answering algorithms can be developed by integrating ontology languages, database technologies, and formalisms for managing probabilistic uncertainty in the context of the Web. The objectives include developing probabilistic data models, developing algorithms for ranking and query answering, identifying useful scalable fragments, and practically evaluating our results.

Planned Impact

Towards next-generation technologies for semantic search and query answering on the Web, the project's central goal is to develop a family of probabilistic data models, along with scalable query answering algorithms, for knowledge bases that are extracted from the Web relative to an underlying ontology. We believe that these data models and query answering algorithms will exert a major influence on the theory and practice of data extraction from the Web and of semantic search and query answering on the Web. Thus, short-term non-academic beneficiaries will include industrial researchers in these fields. So, unsurprisingly, companies working on the former, such as Lixto, and those working on the latter, such as Yahoo!, Google, and Microsoft, have reported a strong interest in the project (Lixto, Yahoo!, and Google by letters of support, and the director of Microsoft's FUSE (Future Social Experiences) Lab in Cambridge in personal communication with the PI).

In longer terms, our research will also lay the foundation for a new generation of information systems that will allow for dealing with incomplete, uncertain, inconsistent, semi-structured, overlapping, and semantically related data. Sources of such data are spreading at a phenomenal rate, most notably in the context of the Web. Thus, in the mid term, non-academic beneficiaries will include industrial researchers who are trying to develop information systems aimed at dealing with this kind of data, as well as industrial researchers and developers who would like to exploit such data in applications. For example, as for the latter, Alcatel-Lucent have uncertain data from community-curated content and sensor readings, which they would like to exploit (see letters of support).

In the long term, non-academic beneficiaries could include anyone using or depending on the Web or on any other information system, which essentially includes every individual and every business and organisation. In particular, any significant advances in semantic search and query answering on the Web may influence all our lives in a similar, revolutionary way as the invention of the technologies behind Google's Web search. For example, one long-term vision of Web search is the one of an intelligent query answering interface in spoken natural language, similar to a human being; such an interface would make the Web accessible to a much larger class of people than nowadays (e.g., by spoken language via mobile phones) and would offer highly exciting new economic and social opportunities, contributing to UK's health, wealth, and culture. For example, a vertical such interface for a specific domain could offer completely new educational opportunities, reaching a much larger class of people at much lower costs than traditional ones. The proposed research will lay the foundations in this direction.

Our contacts and collaborations with industry (see above) will help us to ensure that our work will also have an immediate impact on and benefits for companies and organisations that develop and use Web information systems. For example, the Web data extraction applications developed by Lixto, and the semantic search initiatives for the Web by Yahoo!, Google, and Microsoft all would directly benefit from the results of this project. As non-academic dissemination and engagement activities, in addition to continuing our collaborations with industry, we will also explore the possible commercialisation of our results (e.g., via these collaborations).
 
Description As evidenced by the long list of associated publications, this research grant has generated significant new knowledge and new research methods. In particular, we have developed several probabilistic data models for the Web that allow for representing the different types of probabilistic uncertainty that may occur in Web data, either as an inherent part or as the result of an automatic processing of Web data. We have also developed several techniques for ranking answers to probabilistic ontological queries (which refers to the task of deciding how to sort the answers to a query), analysed the computational complexity of ranking, query answering, and top-k query answering (which describes the actual process of returning the k top-ranked answers to a query), and developed algorithms for all these problems. We have also identified useful fragments of the probabilistic data models for the Web, where ranking, query answering, and top-k query answering are tractable in the data complexity, and designed scalable practical algorithms for these tasks in the identified fragments. Furthermore, we have implemented a prototype for representing, ranking, querying, and top-k querying Web data, optimised its algorithms and implementation, and performed empirical evaluations that back the theoretical results, specifically using real-world data and applications. In particular, these results were published in the journals ACM Transactions on Internet Technology, Annals of Mathematics and Artificial Intelligence (3 papers), Journal on Data Semantics, Theory and Practice of Logic Programming, The VLDB Journal, IEEE Data Engineering Bulletin, Proceedings of the VLDB Endowment, as well as in the proceedings of the top-tier artificial intelligence conferences IJCAI (3 papers), AAAI (4 papers), ECAI (4 papers), KR (4 papers), and UAI, and database conferences ICDE, PODS, and EDBT. As for awards and recognition, Thomas Lukasiewicz, Maria Vanina Martinez, Livia Predoiu, and Gerardo I. Simari received the RuleML 2015 Best Paper Award for their contribution "Existential Rules and Bayesian Networks for Probabilistic Ontological Data Exchange", and the senior research assistant employed on the grant, Gerardo I. Simari, was on IEEE Intelligent System's prestigious "AI's 10 to Watch" list for 2016. Furthermore, Thomas Lukasiewicz started an area editorship for the journal ACM Transactions on Computational Logic and associate editorships for the journals Artificial Intelligence and Journal of Artificial Intelligence Research; he also gave (or will give) invited presentations at BELIEF 2014, ICLP 205, ISMIS 2015, KR 2016, and Reasoning Web 2017. The research funding has also enabled many new collaborations, namely with researchers at Oxford Brookes University, the University of Liverpool, the University of Glasgow, the University of British Columbia, Dresden University of Technology, the Free University of Bozen-Bolzano, the Polytechnic University of Bari, the University of Calabria, and Beijing University of Posts and Telecommunications. Highlights among closely related further funding are a Google European Doctoral Fellowship for Oana Tifrea-Marciuska, two EU Marie-Curie Individual Fellowships, a Leverhulme Trust Visiting Professorship for David Poole, and a seed funding project at the Alan Turing Institute.
Exploitation Route Beneficiaries of the research outcomes include researchers in both academia and industry in the fields of Web data extraction and of semantic search and query answering on the Web. Our research has also laid the foundation for a new generation of information systems that will allow for dealing with incomplete, uncertain, inconsistent, semi-structured, overlapping, and semantically related data. Sources of such data are spreading at a phenomenal rate, most notably in the context of the Web, but also in the form of Big Data in general. Thus, beneficiaries include researchers in both academia and industry who are trying to develop information systems aimed at dealing with this kind of data, as well as researchers and developers who would like to exploit such data in applications. In the long term, beneficiaries could include anyone using or depending on the Web or on any other information system, which essentially includes every individual and every business and organisation. For example, one long-term vision of Web search is the one of an intelligent query answering interface in spoken natural language, similar to a human being; such an interface would make the Web accessible to a much larger class of people than nowadays (e.g., by spoken language via mobile phones) and would offer highly exciting new economic and social opportunities; the proposed research laid the foundations in this direction.
Sectors Communities and Social Services/Policy,Creative Economy,Digital/Communication/Information Technologies (including Software),Education,Environment,Financial Services, and Management Consultancy,Healthcare,Leisure Activities, including Sports, Recreation and Tourism,Government, Democracy and Justice,Culture, Heritage, Museums and Collections,Retail,Security and Diplomacy

URL http://www.cs.ox.ac.uk/projects/PrOQAW/index.html
 
Description Google European Doctoral Fellowship
Amount $180,000 (USD)
Organisation Google 
Sector Private
Country United States
Start 05/2013 
End 09/2016
 
Description Intelligent Question Answering, seed funding project
Amount £23,500 (GBP)
Organisation Alan Turing Institute 
Sector Academic/University
Country Unknown
Start 12/2016 
End 07/2017
 
Description Leverhulme Trust Visiting Professorship
Amount £20,500 (GBP)
Organisation The Leverhulme Trust 
Sector Academic/University
Country United Kingdom
Start 09/2014 
End 06/2015
 
Description Marie-Curie Individual Fellowship
Amount € 200,372 (EUR)
Organisation European Commission 
Sector Public
Country European Union (EU)
Start 03/2013 
End 02/2015
 
Description Marie-Curie Individual Fellowship
Amount € 168,167 (EUR)
Organisation European Commission 
Sector Public
Country European Union (EU)
Start 03/2016 
End 12/2017
 
Description Platform Grant
Amount £1,263,746 (GBP)
Funding ID EP/L012138/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Academic/University
Country United Kingdom
Start 01/2014 
End 01/2019
 
Description Programme Grant
Amount £4,557,635 (GBP)
Funding ID EP/M025268/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Academic/University
Country United Kingdom
Start 04/2015 
End 03/2020
 
Description Collaboration with Beijing University of Posts and Telecommunications (BUPT) 
Organisation Beijing University of Posts and Telecommunications
Country China 
Sector Academic/University 
PI Contribution Expertise, intellectual input, and training of staff.
Collaborator Contribution Expertise and intellectual input.
Impact Several joint publications with Cheng Chen, Xiangwu Meng, and Yishu Miao (see list of publications).
Start Year 2015
 
Description Collaboration with Free University of Bozen-Bolzano 
Organisation Free University of Bozen-Bolzano
Country Italy 
Sector Academic/University 
PI Contribution Expertise, intellectual input, and training of staff.
Collaborator Contribution Expertise and intellectual input.
Impact Several joint publications with Rafael Penaloza (see list of publications).
Start Year 2015
 
Description Collaboration with Oxford Brookes University 
Organisation Oxford Brookes University
Country United Kingdom 
Sector Academic/University 
PI Contribution Expertise and intellectual input.
Collaborator Contribution Expertise and intellectual input.
Impact Joint grant application with Fabio Cuzzolin.
Start Year 2014
 
Description Collaboration with Polytechnic University of Bari 
Organisation Polytechnic University of Bari
Country Italy 
Sector Academic/University 
PI Contribution Expertise, intellectual input, and the training of staff.
Collaborator Contribution Expertise and intellectual input.
Impact Several joint publications with Tommaso Di Noia (see list of publications).
Start Year 2012
 
Description Collaboration with TU Dresden 
Organisation Technical University of Dresden
Department Institute of Theoretical Computer Science
Country Germany 
Sector Academic/University 
PI Contribution Expertise, intellectual input, and training of staff.
Collaborator Contribution Expertise and intellectual input.
Impact Several joint publications with Stefan Borgwardt and Ismail Ilkan Ceylan (see list of publications).
Start Year 2015
 
Description Collaboration with University of British Columbia 
Organisation University of British Columbia
Country Canada 
Sector Academic/University 
PI Contribution Expertise and intellectual input.
Collaborator Contribution Expertise and intellectual input.
Impact Several joint publications with David Poole (see list of publications).
Start Year 2014
 
Description Collaboration with University of Calabria 
Organisation University of Calabria
Country Italy 
Sector Academic/University 
PI Contribution Expertise, intellectual input, and training of staff.
Collaborator Contribution Expertise and intellectual input.
Impact Several joint publications with Cristian Molinaro (see list of publications).
Start Year 2013
 
Description Collaboration with University of Glasgow 
Organisation University of Glasgow
Country United Kingdom 
Sector Academic/University 
PI Contribution Expertise and intellectual input.
Collaborator Contribution Expertise and intellectual input.
Impact Several joint publications with Clemens Kupke (see list of publications).
Start Year 2012
 
Description Collaboration with University of Liverpool 
Organisation University of Liverpool
Country United Kingdom 
Sector Academic/University 
PI Contribution Expertise and intellectual input.
Collaborator Contribution Expertise and intellectual input.
Impact Several joint publications with Andre Hernich (see list of publications).
Start Year 2012