Query-driven Data Acquisition from Web-based Data Sources
Lead Research Organisation:
University of Oxford
Department Name: Computer Science
Abstract
The functioning of entities as diverse as enterprises and government agencies depends onobtaining high-quality data.Increasingly these entities depend on external sourcesfor their operational data: critical datais obtained dynamically via web services, is extracted from web pages,or is purchased from third parties. These sources can differ radicallyin their completeness, accuracy, and availability. It is not possible for applications to indexand explore data from each source in advance of querying:there are too many sources, they are too costly to access, and the data in themmay be refreshed constantly. How should data acquisition proceed in such situations?In this project we will develop algorithms for answering queries in the presence of large numbers ofweb-based data sources, sources that may overlap substantially in their datasetsbut have different access restrictions and costs. Our approach will make use of schema information about thedata an application is querying: data format, integrity constraints, and any prior knowledge of costs that maybe available. The core of the project will be algorithms for answering a query by interactively exploring the sources,dynamically pruning out irrelevant or exhausted sources in the process.
Organisations
People |
ORCID iD |
Michael Benedikt (Principal Investigator) |
Publications
Amarilli A
(2020)
Finite Open-world Query Answering with Number Restrictions
in ACM Transactions on Computational Logic
Amarilli A
(2016)
Query Answering with Transitive and Linear-Ordered Data
Amarilli A
(2017)
When Can We Answer Queries Using Result-Bounded Data Interfaces?
in CoRR abs
Barany V
(2020)
Some Model Theory of Guarded Negation
Benedikt M
(2013)
Two Variable vs. Linear Temporal Logic in Model Checking and Games
in Logical Methods in Computer Science
Benedikt M
(2016)
A Step Up in Expressiveness of Decidable Fixpoint Logics
Benedikt M
(2010)
The impact of virtual views on containment
in Proceedings of the VLDB Endowment
Benedikt M
(2013)
Bisimilarity of Pushdown Automata is Nonelementary
Benedikt M
(2015)
Interpolation with Decidable Fixpoint Logics
Benedikt M
(2016)
Generating Plans from Proofs
in ACM Transactions on Database Systems
Benedikt M
(2015)
The complexity of higher-order queries
in Information and Computation
Benedikt M
(2016)
Querying Visible and Invisible Information
Benedikt M
(2012)
Automata, Languages, and Programming
Benedikt M
(2013)
Two Variable vs. Linear Temporal Logic in Model Checking and Games
Benedikt M
(2011)
Determining relevance of accesses at runtime
Benedikt M
(2014)
Effective interpolation and preservation in guarded logics
Benedikt M
(2017)
20th International Conference on Database Theory
Benedikt M
(2012)
ProFoUnd
Benedikt M
(2015)
Analysis of Schemas with Access Restrictions
in ACM Transactions on Database Systems
Benedikt M
(2016)
Generating Plans from Proofs: The Interpolation-based Approach to Query Reformulation
in Synthesis Lectures on Data Management
Benedikt M
(2015)
Effective Interpolation and Preservation in Guarded Logics
in ACM Transactions on Computational Logic
Benedikt M
(2011)
CONCUR 2011 - Concurrency Theory
Benedikt M
(2019)
Monadic Datalog, Tree Validity, and Limited Access Containment
in ACM Transactions on Computational Logic
Benedikt M
(2012)
Querying schemas with access restrictions
in Proceedings of the VLDB Endowment
Benedikt M.
(2017)
Characterizing definability in decidable fixpoint logics
in Leibniz International Proceedings in Informatics, LIPIcs
Benedikt Michael
(2017)
Goal-Driven Query Answering for Existential Rules with Equality
in arXiv e-prints
Bourhis P
(2015)
Which XML Schemas are Streaming Bounded Repairable?
in Theory of Computing Systems
Bárány V
(2013)
Access patterns and integrity constraints revisited
BÁRÁNY V
(2018)
SOME MODEL THEORY OF GUARDED NEGATION
in The Journal of Symbolic Logic
Chen L
(2012)
QUASAR
Chen L
(2013)
Aggregating semantic annotators
in Proceedings of the VLDB Endowment
Chen L
(2013)
ROSeAnn reconciling opinions of semantic annotators
in Proceedings of the VLDB Endowment
Vu H
(2011)
Complexity of higher-order queries
Description | We discovered that the query optimization can be approached via proof-theoretic methods, and that different proof systems can lead to new query optimization algorithms. |
Exploitation Route | We have created a query optimization system based on them, which we are developing with a customer in a follow-up grant. |
Sectors | Digital/Communication/Information Technologies (including Software),Retail |
Description | Invited talk in Chile |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | Regional |
Primary Audience | Undergraduate students |
Results and Impact | Invited talk in the main seminar of Pontifical Catholic University of Chile's mathematics department. |
Year(s) Of Engagement Activity | 2014 |
URL | https://www.ing.uc.cl/ingenieria-matematica/7-seminario-ingenieria-matematica-2/ |
Description | Invited tutorial at workshop |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Postgraduate students |
Results and Impact | I was invited to give a tutorial on query reformulation at the main summer school in Data Management, associated to the Alberto Mendelzon Workshop on Management of Data. |
Year(s) Of Engagement Activity | 2014 |
Description | Keynote at database workshop |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Postgraduate students |
Results and Impact | Keynote talk on query optimisation over web datasources at workshop on data management. |
Year(s) Of Engagement Activity | 2014 |
URL | https://users.dcc.uchile.cl/~jperez/amw2014/ |
Description | Keynote at main workshop on Description Logics |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Postgraduate students |
Results and Impact | Invited keynote on new approaches to query reformulation in databases at the main meeting for research in Description Logics (DL 2014). |
Year(s) Of Engagement Activity | 2014 |
URL | https://www.dbai.tuwien.ac.at/dl2014/ |
Description | Organization of Workshop on Ontologies and Data Management |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Postgraduate students |
Results and Impact | Lead and co-organizer of a workshop at the Dagstuhl center for computer science, Europe's leading venue for computer science seminars and workshops. The workshop dealt with the interface of data management, logic, and semantic web research, including researchers from each of these areas. |
Year(s) Of Engagement Activity | 2014 |
URL | http://drops.dagstuhl.de/opus/volltexte/2014/4794/ |
Description | Summer school course on Logic and Data Management |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Postgraduate students |
Results and Impact | Presented a 1-week short course on logical issues in data management at one of the main european summer schools, the European Summer School on Logic, Language and Information. |
Year(s) Of Engagement Activity | 2014 |
URL | http://www.evolaemp.uni-tuebingen.de/esslli2014/program/week-two/ |