Query-driven Data Acquisition from Web-based Data Sources

Lead Research Organisation: University of Oxford
Department Name: Computer Science

Abstract

The functioning of entities as diverse as enterprises and government agencies depends onobtaining high-quality data.Increasingly these entities depend on external sourcesfor their operational data: critical datais obtained dynamically via web services, is extracted from web pages,or is purchased from third parties. These sources can differ radicallyin their completeness, accuracy, and availability. It is not possible for applications to indexand explore data from each source in advance of querying:there are too many sources, they are too costly to access, and the data in themmay be refreshed constantly. How should data acquisition proceed in such situations?In this project we will develop algorithms for answering queries in the presence of large numbers ofweb-based data sources, sources that may overlap substantially in their datasetsbut have different access restrictions and costs. Our approach will make use of schema information about thedata an application is querying: data format, integrity constraints, and any prior knowledge of costs that maybe available. The core of the project will be algorithms for answering a query by interactively exploring the sources,dynamically pruning out irrelevant or exhausted sources in the process.

Publications

10 25 50
 
Description We discovered that the query optimization can be approached via proof-theoretic methods, and that different proof systems can lead to new query optimization algorithms.
Exploitation Route We have created a query optimization system based on them, which we are developing with a customer in a follow-up grant.
Sectors Digital/Communication/Information Technologies (including Software),Retail

 
Description Invited talk in Chile 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Undergraduate students
Results and Impact Invited talk in the main seminar of Pontifical Catholic University of Chile's mathematics department.
Year(s) Of Engagement Activity 2014
URL https://www.ing.uc.cl/ingenieria-matematica/7-seminario-ingenieria-matematica-2/
 
Description Invited tutorial at workshop 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact I was invited to give a tutorial on query reformulation at the main summer school in Data Management, associated to the Alberto Mendelzon Workshop on Management of Data.
Year(s) Of Engagement Activity 2014
 
Description Keynote at database workshop 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Keynote talk on query optimisation over web datasources at workshop on data management.
Year(s) Of Engagement Activity 2014
URL https://users.dcc.uchile.cl/~jperez/amw2014/
 
Description Keynote at main workshop on Description Logics 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Invited keynote on new approaches to query reformulation in databases at the main meeting for research in Description Logics
(DL 2014).
Year(s) Of Engagement Activity 2014
URL https://www.dbai.tuwien.ac.at/dl2014/
 
Description Organization of Workshop on Ontologies and Data Management 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Lead and co-organizer of a workshop at the Dagstuhl center for computer science, Europe's leading venue for computer science seminars and workshops. The workshop dealt with the interface of data management, logic, and semantic web research, including researchers from each of these areas.
Year(s) Of Engagement Activity 2014
URL http://drops.dagstuhl.de/opus/volltexte/2014/4794/
 
Description Summer school course on Logic and Data Management 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Presented a 1-week short course on logical issues in data management at one of the main european summer schools, the European
Summer School on Logic, Language and Information.
Year(s) Of Engagement Activity 2014
URL http://www.evolaemp.uni-tuebingen.de/esslli2014/program/week-two/