Enforcement of Constraints on XML Streams
Lead Research Organisation:
University of Oxford
Department Name: Computer Science
Abstract
The eXtensible Markup Language (XML) has become a ubiquitous format for exchanging data. Enterprise data from industries as diverse as finance, healthcare, and genomics are routinely exchanged as XML. Much of this XML-encoded information has to be queried on-the-fly as it arrives -- that is, as an XML stream. News and event information, for example, is available in the form of XML feeds; applications that react to these events must process the feeds in streaming fashion. Communication and messaging protocols also make use of XML, and the corresponding protocol handlers are thus also XML stream-processors.A crucial aspect of processing any form of data is validation: before data is made available to applications, it must be in a sane'' state. In the context of data being exchanged over networks data corruption is ubiquitous, because messages are received from untrusted or even unknown parties. Indeed, many or even most of the data being sent to web-accessible application servers may be from malicious or compromised hosts.The XML community has already developed standardized means for describing constraints on the structure of XML documents. On the one hand, there are schema-based constraints, such as Document Type Definitions (DTDs) giving limitations on the tags that can occur within a document. Qualifiers in the XML query language XPath provide a more flexible method for adding application-specific constraints. But how can a firewall enforce these constraints efficiently on large collections of parallel feeds? This is a critical issue, whether the XML streams represent signalling messages, event feeds, or web service calls. This project will study which constraints can and cannot be enforced efficiently, and will provide tools and technologies to effectively monitor XML streams for violation of both schema constraints and application-specific constraints.
Organisations
People |
ORCID iD |
Michael Benedikt (Principal Investigator) |
Publications
Bousquet-Mélou M
(2014)
XML Compression via Directed Acyclic Graphs
in Theory of Computing Systems
Benedikt M
(2014)
The per-character cost of repairing word languages
in Theoretical Computer Science
Benedikt M
(2015)
The complexity of higher-order queries
in Information and Computation
Amarilli A
(2015)
Finite Open-World Query Answering with Number Restrictions
Amarilli A
(2015)
Combining Existential Rules and Description Logics (Extended Version)
Benaim S
(2016)
Complexity of Two-Variable Logic on Finite Trees
in ACM Transactions on Computational Logic
Benedikt M
(2016)
Limiting Until in Ordered Tree Query Languages
in ACM Transactions on Computational Logic
Bourhis P
(2016)
Bounded Repairability for Regular Tree Languages
in ACM Transactions on Database Systems
Benedikt M
(2017)
Determinacy and rewriting of functional top-down and MSO tree transformations
in Journal of Computer and System Sciences
Amarilli A
(2020)
Finite Open-world Query Answering with Number Restrictions
in ACM Transactions on Computational Logic
Description | We developed techniques for transforming and repairing streams of structured data. |
Exploitation Route | The stream processors could be used for noticing anomalies in (e.g.) news feeds. |
Sectors | Digital/Communication/Information Technologies (including Software) |