Statistical Topological Data Analysis

Lead Research Organisation: University of Oxford
Department Name: Statistics

Abstract

The need for data analysis is ever-growing as we, as a society, acquire more and more data. There are plenty of tools to analyse datasets that naturally lie in an Euclidean space. However, far from all datasets take this form. There are plenty of non-Euclidean data, say, in the form of networks or manifolds, and it is my goal to combine research in statistics with the research in topological data analysis to study such data. I am interested in developing and analysing new methods that can be applied to datasets that are best thought of as samples of points that do not naturally embed into an Euclidean space.

A natural starting point for me seems to be the study of networks, which has countless applications in the fields of biology, social sciences, engineering, chemistry, computer science, neuroscience, and many more. On one hand, there are plenty of probabilistic and statistical tools to analyse both real-life and random networks. On the other hand, each network is a one-dimensional simplicial complex and hence a topological space, which can be studied using techniques from topological data analysis. Moreover, the space of simple networks itself can be endowed with a metric and be viewed as a topological space. One natural question is: given two networks, how can we compare them? It would be very useful to have an algorithm that, given two networks, would be able to quantitatively compare them based on their intrinsic structure. It would be even more useful if such an algorithm made minimal assumptions about the structure of the networks, and would even work for networks that are different in size and structure.

One way to go about it is to define a filtration on both of the networks, apply the persistent homology algorithm and produce barcodes for each of them. We could compare the networks based on their topological summaries (i.e. barcodes). For example, there are multiple natural distance functions on the space of barcodes like the Wassestein or the bottleneck distances, which come with theoretical guarantees like the stability theorem. This is just one example of a topological tool that can be used to compare networks.

The first step in my project would be to see how different topological comparisons of networks work empirically on real-world datasets and also theoretically analyse outputs of such algorithms on random networks using statistical and probabilistic tools. This would contribute to the field of network analysis and well as the analysis of random graphs.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/N509711/1 30/09/2016 29/09/2021
2275810 Studentship EP/N509711/1 30/09/2019 20/04/2023 Tadas Temcinas
EP/R513295/1 30/09/2018 29/09/2023
2275810 Studentship EP/R513295/1 30/09/2019 20/04/2023 Tadas Temcinas
 
Description Reading group talk (GNN group) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Postgraduate students
Results and Impact I gave a presentation in a department's reading group on graph neural networks.
Year(s) Of Engagement Activity 2022
 
Description Reading group talk (Manifold hypothesis testing) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Undergraduate students
Results and Impact Around 10 people attended my reading group talk, where I presented a research paper.
Year(s) Of Engagement Activity 2021
 
Description Reading group talk (Stein variational gradient descent) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Postgraduate students
Results and Impact Around 10 people attended the department's reading group, where I presented a research paper.
Year(s) Of Engagement Activity 2021
 
Description Reading group talk (diffusion maps) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Postgraduate students
Results and Impact Around 20 people attended my reading group talk, where I presented a research paper.
Year(s) Of Engagement Activity 2021
 
Description Seminar research talk (CLT with Applications to Random Complexes) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Postgraduate students
Results and Impact Around 20 people attended my research talk where I presented my the research results from my new preprint.
Year(s) Of Engagement Activity 2022
URL https://www.maths.ox.ac.uk/node/40880
 
Description Seminar research talk (PH with Laplacian eigenvectors) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Postgraduate students
Results and Impact I gave a research talk on the results of my first year in the PhD.
Year(s) Of Engagement Activity 2020
URL https://www.maths.ox.ac.uk/node/35963