Enforcement of Constraints on XML Streams

Lead Research Organisation: University of Oxford
Department Name: Computer Science

Abstract

The eXtensible Markup Language (XML) has become a ubiquitous format for exchanging data. Enterprise data from industries as diverse as finance, healthcare, and genomics are routinely exchanged as XML. Much of this XML-encoded information has to be queried on-the-fly as it arrives -- that is, as an XML stream. News and event information, for example, is available in the form of XML feeds; applications that react to these events must process the feeds in streaming fashion. Communication and messaging protocols also make use of XML, and the corresponding protocol handlers are thus also XML stream-processors.A crucial aspect of processing any form of data is validation: before data is made available to applications, it must be in a sane'' state. In the context of data being exchanged over networks data corruption is ubiquitous, because messages are received from untrusted or even unknown parties. Indeed, many or even most of the data being sent to web-accessible application servers may be from malicious or compromised hosts.The XML community has already developed standardized means for describing constraints on the structure of XML documents. On the one hand, there are schema-based constraints, such as Document Type Definitions (DTDs) giving limitations on the tags that can occur within a document. Qualifiers in the XML query language XPath provide a more flexible method for adding application-specific constraints. But how can a firewall enforce these constraints efficiently on large collections of parallel feeds? This is a critical issue, whether the XML streams represent signalling messages, event feeds, or web service calls. This project will study which constraints can and cannot be enforced efficiently, and will provide tools and technologies to effectively monitor XML streams for violation of both schema constraints and application-specific constraints.

Publications

10 25 50
publication icon
Benedikt M (2014) The per-character cost of repairing word languages in Theoretical Computer Science

publication icon
Benedikt M (2015) The complexity of higher-order queries in Information and Computation

publication icon
Benedikt M (2010) Probabilistic XML via Markov Chains in Proceedings of the VLDB Endowment

publication icon
Benedikt M (2009) Database Programming Languages

publication icon
Benedikt M (2017) Determinacy and rewriting of functional top-down and MSO tree transformations in Journal of Computer and System Sciences

publication icon
Benedikt M (2009) Regular tree languages definable in FO and in FO mod in ACM Transactions on Computational Logic

publication icon
Benedikt M (2013) Bounded repairability of word languages in Journal of Computer and System Sciences

 
Description We developed techniques for transforming and repairing streams of structured data.
Exploitation Route The stream processors could be used for noticing anomalies in (e.g.) news feeds.
Sectors Digital/Communication/Information Technologies (including Software)