Algebraic Topology based Machine Learning

Lead Research Organisation: University of Southampton
Department Name: Sch of Electronics and Computer Sci

Abstract

In recent years the amount of data produced worldwide has grown at a phenomenal rate. As the breadth and depth of datasets has increased, machine learning algorithms have provided remarkable insight into this data. However, small changes to the datasets used by these algorithms can drastically alter the efficacy of these techniques. A network architecture that works perfectly on one data set may not work on a slightly different dataset collected in a similar way. As such, understanding the effect of the topology of an underlying data set has become a current research area that hopes to improve the robustness of learning algorithms. However, its relatively unknown how the topological properties of the original data set effect the output of our learning algorithms. In February 2018, Barsotz Zielinski et al. showed that considering persistence diagrams (2 dimensional representations of high dimensional topological properties) can outperform several state-of-the-art models, so clearly the topological properties of the dataset hold valuable information.

Given this, the PhD student will investigate the current methods used to study the topology of point data sets in greater detail. Although, as mentioned, topological data analysis (TDA) has already been shown to produce promising results, we believe deepening the understanding behind what is happening in this analysis will allow topological properties to be better integrated into learning algorithms. In particular, the student will aim to achieve the following objectives:

1. Sensitivity analysis: To investigate that by slightly changing data sets by identifying key topological points and removing them, how that changes the outcome of a learning algorithm on such datasets (Year 1).
2. Computational feasibility study: After a brief investigation on how point data sets are turned into simplical complexes (mathematical objects that can be studied topologically), he will look into how these complexes have their topologies studied in a way that is computationally feasible, as well as being resistant to noise Year 1).
3. Theoretical performance bounds: The student will then work on proving bounds on information loss in current ways persistence diagrams are mapped into forms that can be integrated into common machine learning workflows, to understanding the extent to which topological information is preserved when transformed into such forms (Year 2).
4. Further generalisation techniques: Next the student will investigate the creation of simplical complexes from point data, and look into potential ways to change the creation of such complexes, that integrate more data into the complexes. The student will also investigate the usage of different fields when calculating the persistence diagrams, explaining the trade-off between computational complexity and topological information, and asking questions about the optimal field to calculate over (Year 2).
Adversarial TDA: Finally, the student will study a topological game between an intelligent adversary attempting to alter the topology of our dataset and an algorithm, acting to try and reduce the effects of this adversary. The objective is to identify whether optimal strategies for this game can help inform attempts to reduce the effects of intelligent attacks on TDA methods (Year 3).

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/R513325/1 01/10/2018 30/09/2023
2280513 Studentship EP/R513325/1 01/10/2019 30/03/2023 Thomas Davies