Evaluating the applicability of Graph Database to Complex, Structured Datasets through the use of Benchmarks

Lead Research Organisation: University of Edinburgh
Department Name: Sch of Informatics

Abstract

We live in an era of big data and this means that the demand for data, its complexity and correctness can only ever grow larger, more connected and more complex. Relational database vendors are ever constantly improving their products and guaranteeing their relevance in handling the ever-growing demand for large, complex and connected data however, some in academia and business are considering alternative solutions in NoSQL database systems, in particularly graph database models to store, query and manage highly connected, complex and large datasets.
Database benchmarks are excellent and viable way to guide users in selecting appropriate database model for their use-cases. However, previous benchmark studies concentrate too much effort on benchmarking performances of database implementations without taking into consideration the many ways businesses use these database models, or could use these database models to solve their business problems and using that knowledge to develop a practical database benchmark that is relevant to business and academia. For example, many businesses translate their highly normalised relational database model to dimensional model (Star Schema) in order to store their large datasets more efficiently and for faster query response to complex queries. A graph database model can achieve the above and hence, benchmarking dimensional model and graph model on same database implementation (e.g.,Oracle Enterprise 18c), would provide clear direction for business as to the better model for low querying latency and efficient storage for structured datasets.
I am convinced that a good database benchmark should concentrate on business problems, use and relevance, testing properties of databases implementations, examine database implementations under various supported system infrastructures and validate the correctness of the benchmark experiments. In the light of the above, my research focuses on finding new approach to benchmark heterogenous database models for structured and highly connected large datasets. Furthermore, it is concerned with an in-depth investigation of composition and properties of graph database models and their application to complex, structured and connected datasets use-cases.
My research will primarily study relational database model, property graph database model and RDF database model and will develop benchmarking techniques to assert the effectiveness of these models on highly connected, complex large datasets.
My research aims to achieve the following:
1. Investigate and benchmark a range of databases that vary in architectural makeup, for example non-native, distributed property graph engines; native distributed RDF graph engines; In-memory distributed property graph engine; cloud-based property graph engine, etc.
2. Investigate practical problems and requirements from business use cases and develop a benchmark to un-cover optimal database models that meet such business requirements and problems.
3. Benchmark database implementations on a variety of system infrastructural platforms.
4. Investigate efficient ways to port property graph datasets into RDF graph datasets.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/N509644/1 01/10/2016 30/09/2021
2080117 Studentship EP/N509644/1 01/04/2018 31/12/2021 Olanrewaju Adeoluwa