Relational and XML Data Exchange: Semantics, Consistency, and Query Answering

Lead Research Organisation: University of Edinburgh
Department Name: Sch of Informatics

Abstract

Data exchange is one of the oldest data management problems. It arisesevery time two or more legacy databases need to exchangeinformation while their schemas cannot be changed. The main technicalchallenges are in building target database instances that correctlyrepresent information from the source data, and in evaluating querieson such databases in a semantically correct manner.The foundational aspects of data exchange had not been studied untilvery recently. Over the past few years, commercial products have appearedwhich help one manage e-business applications that communicate data yetremain autonomous. Such systems, however, use ad hoc query answeringtechniques, which motivates much of research on foundations of dataexchange. The goal of this project is to contribute towards thedevelopment of solid foundations for data exchange, concentrating onsuch critical issues as managing inherent incompleteness ofinformation in data exchange, using it in query answering, andextending techniques from relational databases to the exchange of datarepresented as XML documents.

Publications

10 25 50
 
Description The key findings can be split into two categories. The first one addresses the issue of semantics in data exchange. The second one is about developing the complete toolkit for XML data exchange.

Regarding the first group of results, prior to our work everyone used a single model of data exchange, while admitting its obvious shortcomings. We explained that these shortcomings come from mishandling of incomplete information that naturally occurs in databases arising in data exchange. We developed a framework for performing key tasks of data exchange based on the semantics of incompleteness, and applied it in the scenarios of open world, closed world, and mixed semantics.

Regrading XML data exchange, we developed, essentially from scratch, a complete toolkit for doing XML data exchange. It covers specification of mappings, their static analysis, building target solutions, and query answering. We provided a complete classification of classes of schema mappings based on the complexity of their static analyses; we
classified schema mappings based on the behavior of query answering algorithms, and identified a large and practically relevant class of XML schema mappings that admits particularly efficient static analysis and query answering algorithms. We have answered long-standing open
questions on the complexity of building solutions in data exchange, by providing an algorithm with tractable data complexity for materializing solutions. In addition, we have developed the basics for doing data exchange on instances with incomplete information, bringing the theory much closer to practice (so far, this work was done for relations, as a necessary first step towards extending it to XML).
Exploitation Route These have been used in open source software for providing both relational and XML data exchange.
Sectors Education

 
Description Data exchange: XML Our work on XML data exchange created the standard that others now use in their work, and provided algorithms that are being implemented in research prototypes. Data exchange: semantics Regarding the first group of results, prior to our work everyone used a single model of data exchange, while admitting its obvious shortcomings. We explained that these shortcomings come from mishandling of incomplete information that naturally occurs in databases arising in data exchange. We developed a framework for performing key tasks of data exchange based on the semantics of incompleteness, and applied it in the scenarios of open world, closed world, and mixed semantics. Our approach has since been used by many researchers to provide proper data exchange tools in scenarios where previously it could not be done (for instance, handling target constraints under closed world assumption, or dealing with aggregate queries)
Sector Digital/Communication/Information Technologies (including Software)