Enforcement of Constraints on XML Streams
Lead Research Organisation:
University of Oxford
Department Name: Computer Science
Abstract
The eXtensible Markup Language (XML) has become a ubiquitous format for exchanging data. Enterprise data from industries as diverse as finance, healthcare, and genomics are routinely exchanged as XML. Much of this XML-encoded information has to be queried on-the-fly as it arrives -- that is, as an XML stream. News and event information, for example, is available in the form of XML feeds; applications that react to these events must process the feeds in streaming fashion. Communication and messaging protocols also make use of XML, and the corresponding protocol handlers are thus also XML stream-processors.A crucial aspect of processing any form of data is validation: before data is made available to applications, it must be in a sane'' state. In the context of data being exchanged over networks data corruption is ubiquitous, because messages are received from untrusted or even unknown parties. Indeed, many or even most of the data being sent to web-accessible application servers may be from malicious or compromised hosts.The XML community has already developed standardized means for describing constraints on the structure of XML documents. On the one hand, there are schema-based constraints, such as Document Type Definitions (DTDs) giving limitations on the tags that can occur within a document. Qualifiers in the XML query language XPath provide a more flexible method for adding application-specific constraints. But how can a firewall enforce these constraints efficiently on large collections of parallel feeds? This is a critical issue, whether the XML streams represent signalling messages, event feeds, or web service calls. This project will study which constraints can and cannot be enforced efficiently, and will provide tools and technologies to effectively monitor XML streams for violation of both schema constraints and application-specific constraints.
Organisations
People |
ORCID iD |
Michael Benedikt (Principal Investigator) |
Publications
Amarilli A
(2020)
Finite Open-world Query Answering with Number Restrictions
in ACM Transactions on Computational Logic
Benedikt M
(2016)
Limiting Until in Ordered Tree Query Languages
in ACM Transactions on Computational Logic
Benaim S
(2016)
Complexity of Two-Variable Logic on Finite Trees
in ACM Transactions on Computational Logic
Benedikt M
(2009)
Regular tree languages definable in FO and in FO mod
in ACM Transactions on Computational Logic
Benedikt M
(2010)
Report on the EDBT/ICDT 2010 workshop on updates in XML
in ACM SIGMOD Record
Benedikt M
(2013)
Report on the first workshop on innovative querying of streams
in ACM SIGMOD Record
Description | We developed techniques for transforming and repairing streams of structured data. |
Exploitation Route | The stream processors could be used for noticing anomalies in (e.g.) news feeds. |
Sectors | Digital/Communication/Information Technologies (including Software) |