PrOQAW: Probabilistic Ontological Query Answering on the Web
Lead Research Organisation:
UNIVERSITY OF OXFORD
Department Name: Computer Science
Abstract
The next revolution in Web search as one of the key technologies of the Web has just started with the incorporation of ideas from the Semantic Web, aiming at transforming current Web search into some form of semantic search and query answering on the Web, by adding meaning to Web contents and queries in the form of an underlying ontology. This also allows for more complex queries, and for evaluating queries by combining knowledge that is distributed over many Web pages, i.e., by reasoning over the Web.
Realizing such semantic search and query answering on the Web by adding ontological meaning to the current Web conceptually means annotating Web pages and their contents relative to that ontology, i.e., relating Web pages and their contents to and thus also via that ontology. From a practical perspective, one of the most promising ways of realizing this is to perform data extraction from the current Web relative to the underlying ontology, store the extracted data in a knowledge base, and realize semantic search and query answering on this knowledge base. There are recently many strong research activities in this direction.
A major unsolved problem in the above context is the principled handling of uncertainty: In addition to natural uncertainty as an inherent part of Web data, one also has to deal with uncertainty resulting from automatically processing Web data. The former also includes uncertainty due to incompleteness and inconsistency in the case of missing and over-specified information, respectively. The latter includes uncertainty due to, e.g., the automatic annotation of Web pages and their contents, the automatic extraction of knowledge from the Web, matching between different related ontologies, and the integration of distributed Web data sources.
The central goal of the proposed research is to develop a family of probabilistic data models for knowledge bases extracted from the Web relative to an underlying ontology, along with scalable query answering algorithms, which may serve as the backbone for next-generation technologies for semantic search and query answering on the Web. We believe that such probabilistic data models and query answering algorithms can be developed by integrating ontology languages, database technologies, and formalisms for managing probabilistic uncertainty in the context of the Web. The objectives include developing probabilistic data models, developing algorithms for ranking and query answering, identifying useful scalable fragments, and practically evaluating our results.
Realizing such semantic search and query answering on the Web by adding ontological meaning to the current Web conceptually means annotating Web pages and their contents relative to that ontology, i.e., relating Web pages and their contents to and thus also via that ontology. From a practical perspective, one of the most promising ways of realizing this is to perform data extraction from the current Web relative to the underlying ontology, store the extracted data in a knowledge base, and realize semantic search and query answering on this knowledge base. There are recently many strong research activities in this direction.
A major unsolved problem in the above context is the principled handling of uncertainty: In addition to natural uncertainty as an inherent part of Web data, one also has to deal with uncertainty resulting from automatically processing Web data. The former also includes uncertainty due to incompleteness and inconsistency in the case of missing and over-specified information, respectively. The latter includes uncertainty due to, e.g., the automatic annotation of Web pages and their contents, the automatic extraction of knowledge from the Web, matching between different related ontologies, and the integration of distributed Web data sources.
The central goal of the proposed research is to develop a family of probabilistic data models for knowledge bases extracted from the Web relative to an underlying ontology, along with scalable query answering algorithms, which may serve as the backbone for next-generation technologies for semantic search and query answering on the Web. We believe that such probabilistic data models and query answering algorithms can be developed by integrating ontology languages, database technologies, and formalisms for managing probabilistic uncertainty in the context of the Web. The objectives include developing probabilistic data models, developing algorithms for ranking and query answering, identifying useful scalable fragments, and practically evaluating our results.
Planned Impact
Towards next-generation technologies for semantic search and query answering on the Web, the project's central goal is to develop a family of probabilistic data models, along with scalable query answering algorithms, for knowledge bases that are extracted from the Web relative to an underlying ontology. We believe that these data models and query answering algorithms will exert a major influence on the theory and practice of data extraction from the Web and of semantic search and query answering on the Web. Thus, short-term non-academic beneficiaries will include industrial researchers in these fields. So, unsurprisingly, companies working on the former, such as Lixto, and those working on the latter, such as Yahoo!, Google, and Microsoft, have reported a strong interest in the project (Lixto, Yahoo!, and Google by letters of support, and the director of Microsoft's FUSE (Future Social Experiences) Lab in Cambridge in personal communication with the PI).
In longer terms, our research will also lay the foundation for a new generation of information systems that will allow for dealing with incomplete, uncertain, inconsistent, semi-structured, overlapping, and semantically related data. Sources of such data are spreading at a phenomenal rate, most notably in the context of the Web. Thus, in the mid term, non-academic beneficiaries will include industrial researchers who are trying to develop information systems aimed at dealing with this kind of data, as well as industrial researchers and developers who would like to exploit such data in applications. For example, as for the latter, Alcatel-Lucent have uncertain data from community-curated content and sensor readings, which they would like to exploit (see letters of support).
In the long term, non-academic beneficiaries could include anyone using or depending on the Web or on any other information system, which essentially includes every individual and every business and organisation. In particular, any significant advances in semantic search and query answering on the Web may influence all our lives in a similar, revolutionary way as the invention of the technologies behind Google's Web search. For example, one long-term vision of Web search is the one of an intelligent query answering interface in spoken natural language, similar to a human being; such an interface would make the Web accessible to a much larger class of people than nowadays (e.g., by spoken language via mobile phones) and would offer highly exciting new economic and social opportunities, contributing to UK's health, wealth, and culture. For example, a vertical such interface for a specific domain could offer completely new educational opportunities, reaching a much larger class of people at much lower costs than traditional ones. The proposed research will lay the foundations in this direction.
Our contacts and collaborations with industry (see above) will help us to ensure that our work will also have an immediate impact on and benefits for companies and organisations that develop and use Web information systems. For example, the Web data extraction applications developed by Lixto, and the semantic search initiatives for the Web by Yahoo!, Google, and Microsoft all would directly benefit from the results of this project. As non-academic dissemination and engagement activities, in addition to continuing our collaborations with industry, we will also explore the possible commercialisation of our results (e.g., via these collaborations).
In longer terms, our research will also lay the foundation for a new generation of information systems that will allow for dealing with incomplete, uncertain, inconsistent, semi-structured, overlapping, and semantically related data. Sources of such data are spreading at a phenomenal rate, most notably in the context of the Web. Thus, in the mid term, non-academic beneficiaries will include industrial researchers who are trying to develop information systems aimed at dealing with this kind of data, as well as industrial researchers and developers who would like to exploit such data in applications. For example, as for the latter, Alcatel-Lucent have uncertain data from community-curated content and sensor readings, which they would like to exploit (see letters of support).
In the long term, non-academic beneficiaries could include anyone using or depending on the Web or on any other information system, which essentially includes every individual and every business and organisation. In particular, any significant advances in semantic search and query answering on the Web may influence all our lives in a similar, revolutionary way as the invention of the technologies behind Google's Web search. For example, one long-term vision of Web search is the one of an intelligent query answering interface in spoken natural language, similar to a human being; such an interface would make the Web accessible to a much larger class of people than nowadays (e.g., by spoken language via mobile phones) and would offer highly exciting new economic and social opportunities, contributing to UK's health, wealth, and culture. For example, a vertical such interface for a specific domain could offer completely new educational opportunities, reaching a much larger class of people at much lower costs than traditional ones. The proposed research will lay the foundations in this direction.
Our contacts and collaborations with industry (see above) will help us to ensure that our work will also have an immediate impact on and benefits for companies and organisations that develop and use Web information systems. For example, the Web data extraction applications developed by Lixto, and the semantic search initiatives for the Web by Yahoo!, Google, and Microsoft all would directly benefit from the results of this project. As non-academic dissemination and engagement activities, in addition to continuing our collaborations with industry, we will also explore the possible commercialisation of our results (e.g., via these collaborations).
Organisations
- UNIVERSITY OF OXFORD (Lead Research Organisation)
- Polytechnic University of Bari (Collaboration)
- University of Glasgow (Collaboration)
- OXFORD BROOKES UNIVERSITY (Collaboration)
- Technical University of Dresden (Collaboration)
- Beijing University of Posts and Telecommunications (Collaboration)
- Free University of Bozen-Bolzano (Collaboration)
- UNIVERSITY OF LIVERPOOL (Collaboration)
- University of Calabria (Collaboration)
- UNIVERSITY OF BRITISH COLUMBIA (Collaboration)
Publications

Barany V
(2016)
Declarative Probabilistic Programming with Datalog

Borgwardt S
(2017)
Ontology-Mediated Queries for Probabilistic Databases

Borgwardt S.
(2017)
Ontology-mediated queries for probabilistic databases
in 31st AAAI Conference on Artificial Intelligence, AAAI 2017


Bourhis P.
(2014)
Acyclic query answering under guarded disjunctive existential rules and consequences to DLs
in CEUR Workshop Proceedings

Ceylan I I
(2017)
Query Answering in Ontologies Under Preference Rankings

Ceylan I I
(2016)
Complexity Results for Probabilistic Datalog+/-

Ceylan I I
(2017)
Most Probable Explanations for Probabilistic Database Queries
Description | As evidenced by the long list of associated publications, this research grant has generated significant new knowledge and new research methods. In particular, we have developed several probabilistic data models for the Web that allow for representing the different types of probabilistic uncertainty that may occur in Web data, either as an inherent part or as the result of an automatic processing of Web data. We have also developed several techniques for ranking answers to probabilistic ontological queries (which refers to the task of deciding how to sort the answers to a query), analysed the computational complexity of ranking, query answering, and top-k query answering (which describes the actual process of returning the k top-ranked answers to a query), and developed algorithms for all these problems. We have also identified useful fragments of the probabilistic data models for the Web, where ranking, query answering, and top-k query answering are tractable in the data complexity, and designed scalable practical algorithms for these tasks in the identified fragments. Furthermore, we have implemented a prototype for representing, ranking, querying, and top-k querying Web data, optimised its algorithms and implementation, and performed empirical evaluations that back the theoretical results, specifically using real-world data and applications. In particular, these results were published in the journals ACM Transactions on Internet Technology, Annals of Mathematics and Artificial Intelligence (3 papers), Journal on Data Semantics, Theory and Practice of Logic Programming, The VLDB Journal, IEEE Data Engineering Bulletin, Proceedings of the VLDB Endowment, as well as in the proceedings of the top-tier artificial intelligence conferences IJCAI (3 papers), AAAI (4 papers), ECAI (4 papers), KR (4 papers), and UAI, and database conferences ICDE, PODS, and EDBT. As for awards and recognition, Thomas Lukasiewicz, Maria Vanina Martinez, Livia Predoiu, and Gerardo I. Simari received the RuleML 2015 Best Paper Award for their contribution "Existential Rules and Bayesian Networks for Probabilistic Ontological Data Exchange", and the senior research assistant employed on the grant, Gerardo I. Simari, was on IEEE Intelligent System's prestigious "AI's 10 to Watch" list for 2016. Furthermore, Thomas Lukasiewicz started an area editorship for the journal ACM Transactions on Computational Logic and associate editorships for the journals Artificial Intelligence and Journal of Artificial Intelligence Research; he also gave (or will give) invited presentations at BELIEF 2014, ICLP 205, ISMIS 2015, KR 2016, and Reasoning Web 2017. The research funding has also enabled many new collaborations, namely with researchers at Oxford Brookes University, the University of Liverpool, the University of Glasgow, the University of British Columbia, Dresden University of Technology, the Free University of Bozen-Bolzano, the Polytechnic University of Bari, the University of Calabria, and Beijing University of Posts and Telecommunications. Highlights among closely related further funding are a Google European Doctoral Fellowship for Oana Tifrea-Marciuska, two EU Marie-Curie Individual Fellowships, a Leverhulme Trust Visiting Professorship for David Poole, and a seed funding project at the Alan Turing Institute. |
Exploitation Route | Beneficiaries of the research outcomes include researchers in both academia and industry in the fields of Web data extraction and of semantic search and query answering on the Web. Our research has also laid the foundation for a new generation of information systems that will allow for dealing with incomplete, uncertain, inconsistent, semi-structured, overlapping, and semantically related data. Sources of such data are spreading at a phenomenal rate, most notably in the context of the Web, but also in the form of Big Data in general. Thus, beneficiaries include researchers in both academia and industry who are trying to develop information systems aimed at dealing with this kind of data, as well as researchers and developers who would like to exploit such data in applications. In the long term, beneficiaries could include anyone using or depending on the Web or on any other information system, which essentially includes every individual and every business and organisation. For example, one long-term vision of Web search is the one of an intelligent query answering interface in spoken natural language, similar to a human being; such an interface would make the Web accessible to a much larger class of people than nowadays (e.g., by spoken language via mobile phones) and would offer highly exciting new economic and social opportunities; the proposed research laid the foundations in this direction. |
Sectors | Communities and Social Services/Policy Creative Economy Digital/Communication/Information Technologies (including Software) Education Environment Financial Services and Management Consultancy Healthcare Leisure Activities including Sports Recreation and Tourism Government Democracy and Justice Culture Heritage Museums and Collections Retail Security and Diplomacy |
URL | http://www.cs.ox.ac.uk/projects/PrOQAW/index.html |
Description | Google European Doctoral Fellowship |
Amount | $180,000 (USD) |
Organisation | |
Sector | Private |
Country | United States |
Start | 04/2013 |
End | 09/2016 |
Description | Intelligent Question Answering, seed funding project |
Amount | £23,500 (GBP) |
Organisation | Alan Turing Institute |
Sector | Academic/University |
Country | United Kingdom |
Start | 12/2016 |
End | 07/2017 |
Description | Leverhulme Trust Visiting Professorship |
Amount | £20,500 (GBP) |
Organisation | The Leverhulme Trust |
Sector | Charity/Non Profit |
Country | United Kingdom |
Start | 08/2014 |
End | 06/2015 |
Description | Marie-Curie Individual Fellowship |
Amount | € 200,372 (EUR) |
Organisation | European Commission |
Sector | Public |
Country | European Union (EU) |
Start | 03/2013 |
End | 02/2015 |
Description | Marie-Curie Individual Fellowship |
Amount | € 168,167 (EUR) |
Organisation | European Commission |
Sector | Public |
Country | European Union (EU) |
Start | 03/2016 |
End | 12/2017 |
Description | Platform Grant |
Amount | £1,263,746 (GBP) |
Funding ID | EP/L012138/1 |
Organisation | Engineering and Physical Sciences Research Council (EPSRC) |
Sector | Public |
Country | United Kingdom |
Start | 01/2014 |
End | 01/2019 |
Description | Programme Grant |
Amount | £4,557,635 (GBP) |
Funding ID | EP/M025268/1 |
Organisation | Engineering and Physical Sciences Research Council (EPSRC) |
Sector | Public |
Country | United Kingdom |
Start | 03/2015 |
End | 03/2020 |
Description | Collaboration with Beijing University of Posts and Telecommunications (BUPT) |
Organisation | Beijing University of Posts and Telecommunications |
Country | China |
Sector | Academic/University |
PI Contribution | Expertise, intellectual input, and training of staff. |
Collaborator Contribution | Expertise and intellectual input. |
Impact | Several joint publications with Cheng Chen, Xiangwu Meng, and Yishu Miao (see list of publications). |
Start Year | 2015 |
Description | Collaboration with Free University of Bozen-Bolzano |
Organisation | Free University of Bozen-Bolzano |
Country | Italy |
Sector | Academic/University |
PI Contribution | Expertise, intellectual input, and training of staff. |
Collaborator Contribution | Expertise and intellectual input. |
Impact | Several joint publications with Rafael Penaloza (see list of publications). |
Start Year | 2015 |
Description | Collaboration with Oxford Brookes University |
Organisation | Oxford Brookes University |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | Expertise and intellectual input. |
Collaborator Contribution | Expertise and intellectual input. |
Impact | Joint grant application with Fabio Cuzzolin. |
Start Year | 2014 |
Description | Collaboration with Polytechnic University of Bari |
Organisation | Polytechnic University of Bari |
Country | Italy |
Sector | Academic/University |
PI Contribution | Expertise, intellectual input, and the training of staff. |
Collaborator Contribution | Expertise and intellectual input. |
Impact | Several joint publications with Tommaso Di Noia (see list of publications). |
Start Year | 2012 |
Description | Collaboration with TU Dresden |
Organisation | Technical University of Dresden |
Department | Institute of Theoretical Computer Science |
Country | Germany |
Sector | Academic/University |
PI Contribution | Expertise, intellectual input, and training of staff. |
Collaborator Contribution | Expertise and intellectual input. |
Impact | Several joint publications with Stefan Borgwardt and Ismail Ilkan Ceylan (see list of publications). |
Start Year | 2015 |
Description | Collaboration with University of British Columbia |
Organisation | University of British Columbia |
Country | Canada |
Sector | Academic/University |
PI Contribution | Expertise and intellectual input. |
Collaborator Contribution | Expertise and intellectual input. |
Impact | Several joint publications with David Poole (see list of publications). |
Start Year | 2014 |
Description | Collaboration with University of Calabria |
Organisation | University of Calabria |
Country | Italy |
Sector | Academic/University |
PI Contribution | Expertise, intellectual input, and training of staff. |
Collaborator Contribution | Expertise and intellectual input. |
Impact | Several joint publications with Cristian Molinaro (see list of publications). |
Start Year | 2013 |
Description | Collaboration with University of Glasgow |
Organisation | University of Glasgow |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | Expertise and intellectual input. |
Collaborator Contribution | Expertise and intellectual input. |
Impact | Several joint publications with Clemens Kupke (see list of publications). |
Start Year | 2012 |
Description | Collaboration with University of Liverpool |
Organisation | University of Liverpool |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | Expertise and intellectual input. |
Collaborator Contribution | Expertise and intellectual input. |
Impact | Several joint publications with Andre Hernich (see list of publications). |
Start Year | 2012 |