New insights on Data Science through Topological Data Analysis

Lead Research Organisation: University of Oxford
Department Name: Mathematical Institute

Abstract

This project falls within the EPSRC Application driven Topological Data Analysis area.

Research in Homological Algebra and more generally Algebraic Topology constitutes a significant field within the spectra of fundamental mathematics. Recently though, the algorithmic nature of this area appeared to be sufficient to bring new tools to data analysis imported from the world of Topology, collectively known under the name of Topological Data Analysis (TDA). Algorithms such as Mapper or the various ways to approximate the homological groups of the manifold underlying a dataset are expected to allow new unsupervised understanding of data sets. One strong aspect of this new approach compared to other Machine Learning techniques is that it benefits from many theoretical guarantees such as robustness to perturbation of the data or guarantees of convergence. On the other hand, it is general enough to be applied to a wide range of data types. Examples exist in great numbers. To cite a few:
- Knot theory may allow finding algorithmic procedure to simulate the folding of proteins or DNA.
- Persistent Homology is applied in Neuroscience for instance to design clusters of neurons.
- The Reeb graph is an instance of a Topological way to analyse data in all generality.
- A graph being an instance of simplicial complexes, persistent homology may be applied to it. This impacts the analysis of social/media/biological graphs. This extends to abstract graphs that might often be casted in Hilbert spaces thanks to computational technics such as Laplace-Beltrami transformation.
- Sheaf co-homology has numerical implications in the world of quantum chemistry.
- Barcodes are well-suited to the analysis of time-series.
- Persistent Homology and Mapper might be used in any context of data analysis to obtain new topological features of the data, hence strengthening their Hilbert representation. These new representations might then result in better performing Machine learning analysis, whether supervised or unsupervised.

Questions remain numerous: On the theoretical side, can we strengthen the existing algorithms? Can we devise new ones adapted to novel contexts? On a more practical side, how can we position TDA in the world of Data Analysis? Can we find an encapsulating framework? How can we further contribute in the impact of TDA on topics such as Neuroscience, Biology or Music?

My ambition is twofold: both to make theoretical advances and improve the effectiveness of the tools in practical applications. First, I will delve into multi-persistent homology to improve its utility for data analysis while studying its properties such as stability. Second, I want to apply and adapt existing tools of TDA to domains such as biology or music. I am also interested in Deep Learning, thanks to much previous experience in this domain. I would like to see whether TDA's algorithms, such as the computations of barcodes, can be inserted in a Machine Learning pipeline. In particular, how to optimize and learn the filters we until now choose in an arbitrary way to compute those barcodes? I believe that Algebraic Topology is an adequate framework to partly demystify the empirically observed properties of neural networks. For instance, I would like to consider the topological evolution of signals when they evolve throughout the layers of neural networks.

Possible collaboration:
- Spotify
- Olivier Pietquin (DeepMind)
- Steve Oudot (INRIA)

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/R513295/1 01/10/2018 30/09/2023
2099941 Studentship EP/R513295/1 01/10/2018 30/09/2021 Jacob Leygonie