Computing over Compressed Graph-Structured Data

Lead Research Organisation: University of Liverpool
Department Name: Computer Science

Abstract

The project aims to bring computation over compressed data to massive graph-structured datasets by extending optimally-compressed tree data structures we developed to certain classes of graphs. Graph-structured datasets such as knowledge graphs or social networks are growing in importance and size; at the same time, computation is increasingly pushed to mobile devices with limited memory capacity. Many applications yield large, but partially repetitive and predictable datasets, which makes them compressible; but on mobile devices, data is only useful when it can be queried directly in a compressed representation that fits into the device memory. Current methods for computing over compressed data do not yet work well for this scenario.

In order to enable queries on compressed graph-structured data we need to answer three research questions.
1. We need to know the intrinsic information content of graph-structured data so that we can decide whether a dataset can be sufficiently compressed to fit into local memory.
2. We need to know how to effectively compress graph-structured data, so that we can economically transmit and store graph-structured data on mobile devices.
3. We need to know how to answer queries on a compressed representation, so that we can make effective use of its compressibility while querying over a graph-structured dataset.

This project will combine methods from information theory, data compression, and succinct data structures, to carry out three work packages.
1. We will propose new notions of random sources and empirical entropy in order to approximate the intrinsic information content of graph-structured data.
2. We will develop new compression methods based on probabilistic context-free grammars (PCFGs) and probabilistic multiple context-free grammars (PMCFGs) in order to effectively compress graph-structured data.
3. We will apply and extend our tools for succinct tree data structures to new types of graphs and RNA structure data in order to enable computing directly over compressed graph-structured data.
We will use the outcomes of the work packages to create a versatile toolbox of space-efficient data structures to ease the development of applications working with massive graph-structured datasets.

Publications

10 25 50