XML with Incomplete Information: Representation, Querying, and Applications

Lead Research Organisation: University of Edinburgh
Department Name: Sch of Informatics

Abstract

Data on the Web - particularly XML data - is often incomplete andinconsistent, due to such factors as the lack of centralisation andcontrol over data quality. While the transfer and extension ofrelational tools to deal with XML data has been a central theme indata management research over the past decade, the standard databasetoolbox offers us little in terms of handling ofincompleteness. Indeed, it is one of the most notoriouslyunderdeveloped and most often criticised aspects of relationaldatabases. In addition, the flexibility of XML leads to many ways inwhich incompleteness of data can be accommodated, in addition to thestandard relational null values.There has not yet been any detailed study of incompleteness in XML.Our main goal is to conduct such a systematic study, and develop itsapplications in the area that underlies data management tasks on theWeb -- the use of data across multiple independent applications.We shall investigate models of XML with incomplete information andalgorithmic techniques for querying such data, paying particularattention to the correctness/complexity tradeoffs and to the practicalityof algorithmic tools. We shall investigate the fundamental role ofincompleteness in applications that involve the movement of XML data,such as integration of data from various sources or moving databetween peers according to mappings between their schemas. We shalldevelop a specification and algorithmic toolbox for dealing withincomplete information as it arises in such applications.

Publications

10 25 50
publication icon
Amano S (2014) XML Schema Mappings Data Exchange and Metadata Management in Journal of the ACM

publication icon
Amano S (2009) XML schema mappings

publication icon
Arenas M (2010) Relational and XML Data Exchange in Synthesis Lectures on Data Management

publication icon
Arenas M (2013) Solutions and query rewriting in data exchange in Information and Computation

publication icon
Barcelo P (2013) Graph Logics with Rational Relations in Logical Methods in Computer Science

publication icon
Barcelo, P (2012) On Low Treewidth Approximations of Conjunctive Queries in Proceedings of the 6th Alberto Mendelzon International Workshop on Foundations of Data Management

publication icon
Barceló P (2011) Querying graph patterns

 
Description Data on the Web - particularly XML data - is often incomplete and
inconsistent, due to such factors as the lack of centralisation and
control over data quality. The flexibility of XML leads to many ways
in which incompleteness of data can be accommodated, in addition to
the standard relational null values.

We have provided a detailed study of incompleteness in XML, and
provided its applications in the area that underlies data management
tasks on the Web -- the use of data across multiple independent
applications.

We classified models of XML with incomplete information and
algorithmic techniques for querying such data, paying particular
attention to the correctness/complexity tradeoffs and the practicality
of algorithmic tools. We demonstrated the fundamental role of
incompleness in applications that involve the movement of XML data,
such as integration of data from various sources or moving data
between peers according to mappings between their schemas. We
developed a specification and algorithmic toolbox for dealing with
incomplete information as it arises in such applications.
Exploitation Route Processing XML data with incomplete and imprecise information
Sectors Digital/Communication/Information Technologies (including Software)

 
Description The key impacts are twofold: understanding the role of incompleteness in data exchange systems, and providing models of incompleteness in complex data models. The former had impact on the design of data exchange systems, the latter on the development of incompleteness models for complex structures used in today's data management tasks (graph data, RDF).
Sector Digital/Communication/Information Technologies (including Software)