Scalable Management of Spatial Data

Lead Research Organisation: University of Oxford
Department Name: Computer Science

Abstract

This project falls within the EPSRC Information and Communication Technologies (ICT) theme and the
Information Systems research area.

In recent decades, the volume of spatial data (e.g. location, routing, navigation) is increasing dramatically
across the world. In this project, we are motivated by the fact that conventional data management and
processing technologies are becoming inadequate for analyzing large-scale spatial data and there is an
emergence of novel approaches. Our overall objective is to investigate the above-mentioned problem and
give solutions based on a crossover of Distributed Processing and Factorized Databases.

Aims and Objectives
Investigate the theoretical foundations of factorized spatial data management.
Spatial databases are optimized systems for storing and querying data that represent objects dened
in a geometric space. As a result, they require additional functionality to be intergrated into their
query languages in order to process spatial data types efficiently.
We are going to accommodate recent development on factorized databases to evaluate typical queries
for spatial databases. Factorizing spatial queries will enable us to avoid redundancy and as a result
achieve better performance and possibly asymptotical lower complexity.
Optimized distributed processing algorithms for spatial data analysis.
Our methodology will combine the developments on the algorithmic design of parallel and dis-
tributed methods with the principles of factorized databases to lower the communication cost in
such distributed algorithms. We are going to use the Massively Parallel Computation (MPC) model,
to analyze our results.
Develop distributed algorithms for clustering, classication and pattern recognition of spatial data.
Clustering, classication and pattern recognition tasks are very common in spatial data analysis
because they provide a way of generalizing information retrieved from a database.
We will propose new state-of-the-art algorithms to compute these tasks on factorized representations
of spatial data and compute them in parallel and distributed way.
Distill practical lessons from applying the novel algorithms on real-world spatial datasets.
We rely on our industrial partner Ordnance Survey to provide us with real-world spatial datasets
and liaise with them to nd natural ways to disseminate the outcome of our research to their
development team.

Novelty of Research Methodology
Factorized databases are a fresh look at the problem of computing and representing results of relational
queries. Consequently, our research methology is novel, since we are going to explore the special charac-
teristics of spatial relational queries under the scope of the factorized representation and computation of
queries over large spatial datasets. This is a novel research topic that has not been investigated so far.

Alignment to EPSRC's Strategies
This project aligns to EPSRC's priorities for data enabled decision making. This is because it aims to
make contribution on new methods to support people make decisions in a world that is becoming ever
more data rich.

Collaborators
The datasets will be provided by Ordnance Survey, the premier UK organization that produces topo-
graphic maps. They are interested in this research and we expect collaboration with them on under-
standing challenges and existing solutions for managing spatial data.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/R512060/1 01/10/2017 31/03/2023
2115977 Studentship EP/R512060/1 01/10/2018 30/09/2022 Antonia Kormpa