# Statistical Topological Data Analysis

Lead Research Organisation:
University of Oxford

Department Name: Statistics

### Abstract

The need for data analysis is ever-growing as we, as a society, acquire more and more data. There are plenty of tools to analyse datasets that naturally lie in an Euclidean space. However, far from all datasets take this form. There are plenty of non-Euclidean data, say, in the form of networks or manifolds, and it is my goal to combine research in statistics with the research in topological data analysis to study such data. I am interested in developing and analysing new methods that can be applied to datasets that are best thought of as samples of points that do not naturally embed into an Euclidean space.

A natural starting point for me seems to be the study of networks, which has countless applications in the fields of biology, social sciences, engineering, chemistry, computer science, neuroscience, and many more. On one hand, there are plenty of probabilistic and statistical tools to analyse both real-life and random networks. On the other hand, each network is a one-dimensional simplicial complex and hence a topological space, which can be studied using techniques from topological data analysis. Moreover, the space of simple networks itself can be endowed with a metric and be viewed as a topological space. One natural question is: given two networks, how can we compare them? It would be very useful to have an algorithm that, given two networks, would be able to quantitatively compare them based on their intrinsic structure. It would be even more useful if such an algorithm made minimal assumptions about the structure of the networks, and would even work for networks that are different in size and structure.

One way to go about it is to define a filtration on both of the networks, apply the persistent homology algorithm and produce barcodes for each of them. We could compare the networks based on their topological summaries (i.e. barcodes). For example, there are multiple natural distance functions on the space of barcodes like the Wassestein or the bottleneck distances, which come with theoretical guarantees like the stability theorem. This is just one example of a topological tool that can be used to compare networks.

The first step in my project would be to see how different topological comparisons of networks work empirically on real-world datasets and also theoretically analyse outputs of such algorithms on random networks using statistical and probabilistic tools. This would contribute to the field of network analysis and well as the analysis of random graphs.

A natural starting point for me seems to be the study of networks, which has countless applications in the fields of biology, social sciences, engineering, chemistry, computer science, neuroscience, and many more. On one hand, there are plenty of probabilistic and statistical tools to analyse both real-life and random networks. On the other hand, each network is a one-dimensional simplicial complex and hence a topological space, which can be studied using techniques from topological data analysis. Moreover, the space of simple networks itself can be endowed with a metric and be viewed as a topological space. One natural question is: given two networks, how can we compare them? It would be very useful to have an algorithm that, given two networks, would be able to quantitatively compare them based on their intrinsic structure. It would be even more useful if such an algorithm made minimal assumptions about the structure of the networks, and would even work for networks that are different in size and structure.

One way to go about it is to define a filtration on both of the networks, apply the persistent homology algorithm and produce barcodes for each of them. We could compare the networks based on their topological summaries (i.e. barcodes). For example, there are multiple natural distance functions on the space of barcodes like the Wassestein or the bottleneck distances, which come with theoretical guarantees like the stability theorem. This is just one example of a topological tool that can be used to compare networks.

The first step in my project would be to see how different topological comparisons of networks work empirically on real-world datasets and also theoretically analyse outputs of such algorithms on random networks using statistical and probabilistic tools. This would contribute to the field of network analysis and well as the analysis of random graphs.

## People |
## ORCID iD |

G Reinert (Primary Supervisor) | |

Tadas Temcinas (Student) |

### Studentship Projects

Project Reference | Relationship | Related To | Start | End | Student Name |
---|---|---|---|---|---|

EP/R513295/1 | 01/10/2018 | 30/09/2023 | |||

2275810 | Studentship | EP/R513295/1 | 01/10/2019 | 21/04/2023 | Tadas Temcinas |