Compiler and Runtime optimisations for Graph Databases

Lead Research Organisation: University of Manchester
Department Name: Computer Science

Abstract

Graph databases (GDBs) have gained significant attention during the last decade following the explosion of Big Data processing due to social media, IoT, AI, etc. While users are constantly demanding faster access of high volume data, data analysis and predictive analytics are just going to increase the data access requirements.

In particular, one of the major challenges of Java-based GDBs is the fact that they are affected by the performance of the system. Providing more memory, doesn't translate automatically to better performance, due to the limits of the Runtime Environment (RTE) they execute on, such as libraries, processes, Virtual Machine (VM) etc.

The main contribution of this project is the focus on hardware/software interaction methods in order to solve the scalability problem of Java Graph databases, especially on a Non-Uniform Memory Access (NUMA) architecture.

The key objectives of this project are:
1) to investigate how system functionalities (NUMAness, compiler, runtime, etc.) affect the performance of a Graph Database,
2) to perform targeted optimisations for GDBs (compiler/runtime and memory optimisations, etc.) to improve data locality, and
3) to implement a custom version of experimental platform showcasing the aforementioned optimisations.

The approach to be taken in order to proceed with the above, is:
1) the implementation of an Allocation Memory Profiler for the inspection of allocation patterns of applications running on the VM and description of the domain application and its detailed performance analysis.
2) the implementation of a tool on top of the VM, in order to wrap "performance" functionality into the VM and thus to allow the VM to access the CPU's performance monitoring unit hardware counters.
3) the implementation of a NUMA-aware memory manager (Application Programming Interface - API), for off-heap memory allocation and accesses.

To perform a scalability analysis of GDBs a benchmark suite for NoSQL Databases which supports Relational Property Graphs will be used. This benchmark suite, tests the following three scenarios:
interactive: A transaction/query workload
business intelligence: An analytical/query workload
graph algorithms: Graph analysis algorithms

A detailed performance analysis will be conducted for a thorough evaluation of all aspects of the experimental platform, in order to investigate the GDBs' allocation behavior, categorise all the occurred patters and correlate them with the semantics of each benchmark's query.

Our research will focus on the VM's Garbage Collector, and its effect on the overall performance of
GDBs. Since Java GDBs' querying languages are just another Java program, they fall into the same optimisation realm of standard Java programs. However, the sensitivity many GDBs have at the memory subsystem, steers our research efforts also on GC tuning and compiler optimisations.

Relevant EPSRC research areas:
Databases
Programming languages and compilers
Architectures and operating systems
Software engineering

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/N509565/1 01/10/2016 30/09/2021
2560814 Studentship EP/N509565/1 01/07/2017 31/12/2020 Andreas Andronikakis