Querying Graph Structured Data: Principles and Techniques

Lead Research Organisation: University of Edinburgh
Department Name: Sch of Informatics

Abstract

One of the most challenging problems in information processing is handling large amounts of information (EPSRC, in its priority theme `Towards an intelligent information infrastructure' refers to 'deluge of data' and delivering `understanding from information' as the key problems). Very often nowadays, these vast amounts of information are associated with new applications. For instance, the popularity of social networks such as LinkedIn, Facebook, and others, results in large amounts of data they accumulate. The Semantic Web effort generates high volumes of data too, as it attempts to facilitate
understanding of Web data. Many of these applications have one feature in common: the underlying data model is described by a graph, and in querying such graph-structured data, the topology of the graph is as important as the data itself. Graph data arises in a multitude of other applications, including intelligence analysis and crime detection, biology, cheminformatics, knowledge discovery, and network traffic.

At the same time, foundational aspects of graph data have not yet been adequately studied. While sharing the same high-level model, applications of graph data are developing largely independently of each other, producing separate - but related - sets of data processing tools.

The main goal of this project is to develop principles and techniques underlying querying of graph data, concentrating on query language design, algorithmic tools for query processing tasks, delivering exact or approximate answers fast from extremely large graph databases, and implementing the algorithmic toolbox in a prototype to be used by our industrial partners.

The main challenges lie in combining data and topology in querying, in the inherent complexity of graph queries, and in the dynamic and distributed nature of graph data. The project will be split into three components. The first will concentrate on query language design for handling data and topology, and on different semantics of query answering for dealing with extremely large data graphs. The second will provide an algorithmic toolbox for query evaluation techniques, for all possible combinations of the query processing mode (batch vs incremental), for the locality of data (single-site vs distributed), and for the type of query answers (exact vs approximate). The third component concerns with the implementation of a prototype on which we shall test our design decisions and algorithms.

Planned Impact

The following will benefit from our research:

- Private sector, such as vendors for graph databases (Yahoo!, Amazon, etc.) and providers of social networks (LinkedIn, Twitter, FaceBook, etc.).

- Public sector, such as organisations involved in crime detection, intelligence analysis, and transportation networks;

- The wider public, such as users of semantic Web and social networks; and

- Research community across several disciplines such as knowledge discovery, biology, cheminformatics, cloud computing, semantics Web and social networks.

They will benefit in the following ways:

(a) The project will provide guidance for designing new graph query languages.

(b) Our query evaluation algorithmic toolbox will help developers of graph databases better support querying graph data.

(c) Our tools will provide users of graph data (e.g., social networks) with more effective and efficient data access.

To deliver the impact, we shall do the following:

- continue ongoing and establish new industrial collaborations; provide packaged solutions at various stages for our industrial collaborators to evaluate, to ensure that our tools and techniques are properly tested to guarantee their practicality;

- continue active interaction with leading researchers on graph databases, and publish results of our research in top venues to ensure maximum visibility; and

- deliver seminars, public lectures, and enhance existing and create new teaching modules based on the outcomes of this project.

Publications

10 25 50
publication icon
Amano S (2014) XML Schema Mappings Data Exchange and Metadata Management in Journal of the ACM

publication icon
Arenas M (2013) Solutions and query rewriting in data exchange in Information and Computation

publication icon
Barcelo P (2016) Order-Invariant Types and Their Applications in Logical Methods in Computer Science

publication icon
Barcelo P (2013) Graph Logics with Rational Relations in Logical Methods in Computer Science

publication icon
Barceló P (2014) Efficient Approximations of Conjunctive Queries in SIAM Journal on Computing

publication icon
Barceló P (2013) Parameterized regular expressions and their languages in Theoretical Computer Science

publication icon
Barceló P (2014) Querying Regular Graph Patterns in Journal of the ACM

 
Description Graph databases are emerging as a new paradigm for storing and querying data. This is due to many new applications such as social networks that can naturally be viewed as graphs, or the Semantic Web, whose underlying data model (RDF) is modeled via graphs. We have worked on many tasks related to querying graph data. One general direction is marrying topology and data in querying graphs. Typically those two features are treated separately: the former analyzes the structure of the graph itself, and the latter looks at data stored at nodes and edges. We have developed a family of languages that can talk about these at once, and that have very good evaluation properties. Second, we look at the typical problem of matching in large graphs, that appears in many applications. The exact solution is prohibitively expensive, and we have developed many approaches that give good approximations and work well in practice. For very large graphs, we also determined when queries can be answered in a scale-independent way, by looking at only a small part of the graph.
Exploitation Route We developed a new family of languages and a new class of algorithms for graph queries. These can be used by the developers and the many users of graph database systems.
Sectors Healthcare

Leisure Activities

including Sports

Recreation and Tourism

Security and Diplomacy

 
Description Languages for data graphs are now being implemented and tested in various scenarios by other teams; for instance our language Graph XPath has recently been implemented and successfully tested in a map-reduce environment. It was being considered by a leading graph database vendor (Neo4j Inc) for adding to their query language. In the end it was listed as a key academic contribution towards the new standard of graph query languages called GQL; the PI of the project is an active participant and, via BSI, a member of the ISO committee in charge of the language design. Multiple ideas on querying data graphs have also found their way into the new standard of querying graphs by relational databases, called SQL/PGQ, especially in its pattern matching facilities. Algorithms for approximate matching are also finding real life applications in social networks environments and telecommunications applications.
First Year Of Impact 2017
Sector Digital/Communication/Information Technologies (including Software)
Impact Types Policy & public services