XML with Incomplete Information: Representation, Querying, and Applications

Lead Research Organisation: University of Edinburgh
Department Name: Sch of Informatics

Abstract

Data on the Web - particularly XML data - is often incomplete andinconsistent, due to such factors as the lack of centralisation andcontrol over data quality. While the transfer and extension ofrelational tools to deal with XML data has been a central theme indata management research over the past decade, the standard databasetoolbox offers us little in terms of handling ofincompleteness. Indeed, it is one of the most notoriouslyunderdeveloped and most often criticised aspects of relationaldatabases. In addition, the flexibility of XML leads to many ways inwhich incompleteness of data can be accommodated, in addition to thestandard relational null values.There has not yet been any detailed study of incompleteness in XML.Our main goal is to conduct such a systematic study, and develop itsapplications in the area that underlies data management tasks on theWeb -- the use of data across multiple independent applications.We shall investigate models of XML with incomplete information andalgorithmic techniques for querying such data, paying particularattention to the correctness/complexity tradeoffs and to the practicalityof algorithmic tools. We shall investigate the fundamental role ofincompleteness in applications that involve the movement of XML data,such as integration of data from various sources or moving databetween peers according to mappings between their schemas. We shalldevelop a specification and algorithmic toolbox for dealing withincomplete information as it arises in such applications.

Publications

10 25 50
publication icon
Kolahi S (2008) An information-theoretic analysis of worst-case redundancy in database design in ACM Transactions on Database Systems

publication icon
Hernich A (2011) Closed world data exchange in ACM Transactions on Database Systems

publication icon
David C (2012) Efficient reasoning about data trees via integer linear programming in ACM Transactions on Database Systems

publication icon
Gheerbrant A (2014) Naïve Evaluation of Queries over Incomplete Databases in ACM Transactions on Database Systems

publication icon
Chirkova R (2012) Tractable XML data exchange via relations in Frontiers of Computer Science

publication icon
Arenas M (2013) Solutions and query rewriting in data exchange in Information and Computation

publication icon
Libkin L (2010) Disjoint pattern matching and implication in strings in Information Processing Letters

publication icon
Libkin L (2010) Reasoning about XML with temporal logics and automata in Journal of Applied Logic

publication icon
Deng T (2013) On the aggregation problem for synthesized Web services in Journal of Computer and System Sciences

publication icon
Libkin L (2015) Regular expressions for data words in Journal of Computer and System Sciences

 
Description Data on the Web - particularly XML data - is often incomplete and
inconsistent, due to such factors as the lack of centralisation and
control over data quality. The flexibility of XML leads to many ways
in which incompleteness of data can be accommodated, in addition to
the standard relational null values.

We have provided a detailed study of incompleteness in XML, and
provided its applications in the area that underlies data management
tasks on the Web -- the use of data across multiple independent
applications.

We classified models of XML with incomplete information and
algorithmic techniques for querying such data, paying particular
attention to the correctness/complexity tradeoffs and the practicality
of algorithmic tools. We demonstrated the fundamental role of
incompleness in applications that involve the movement of XML data,
such as integration of data from various sources or moving data
between peers according to mappings between their schemas. We
developed a specification and algorithmic toolbox for dealing with
incomplete information as it arises in such applications.
Exploitation Route Processing XML data with incomplete and imprecise information
Sectors Digital/Communication/Information Technologies (including Software)

 
Description The key impacts are twofold: understanding the role of incompleteness in data exchange systems, and providing models of incompleteness in complex data models. The former had impact on the design of data exchange systems, the latter on the development of incompleteness models for complex structures used in today's data management tasks (graph data, RDF).
Sector Digital/Communication/Information Technologies (including Software)