New insights on Data Science through Topological Data Analysis

Lead Research Organisation: University of Oxford
Department Name: Mathematical Institute

Abstract

This project falls within the EPSRC Application driven Topological Data Analysis area.

Research in Homological Algebra and more generally Algebraic Topology constitutes a significant field within the spectra of fundamental mathematics. Recently though, the algorithmic nature of this area appeared to be sufficient to bring new tools to data analysis imported from the world of Topology, collectively known under the name of Topological Data Analysis (TDA). Algorithms such as Mapper or the various ways to approximate the homological groups of the manifold underlying a dataset are expected to allow new unsupervised understanding of data sets. One strong aspect of this new approach compared to other Machine Learning techniques is that it benefits from many theoretical guarantees such as robustness to perturbation of the data or guarantees of convergence. On the other hand, it is general enough to be applied to a wide range of data types. Examples exist in great numbers. To cite a few:
- Knot theory may allow finding algorithmic procedure to simulate the folding of proteins or DNA.
- Persistent Homology is applied in Neuroscience for instance to design clusters of neurons.
- The Reeb graph is an instance of a Topological way to analyse data in all generality.
- A graph being an instance of simplicial complexes, persistent homology may be applied to it. This impacts the analysis of social/media/biological graphs. This extends to abstract graphs that might often be casted in Hilbert spaces thanks to computational technics such as Laplace-Beltrami transformation.
- Sheaf co-homology has numerical implications in the world of quantum chemistry.
- Barcodes are well-suited to the analysis of time-series.
- Persistent Homology and Mapper might be used in any context of data analysis to obtain new topological features of the data, hence strengthening their Hilbert representation. These new representations might then result in better performing Machine learning analysis, whether supervised or unsupervised.

Questions remain numerous: On the theoretical side, can we strengthen the existing algorithms? Can we devise new ones adapted to novel contexts? On a more practical side, how can we position TDA in the world of Data Analysis? Can we find an encapsulating framework? How can we further contribute in the impact of TDA on topics such as Neuroscience, Biology or Music?

My ambition is twofold: both to make theoretical advances and improve the effectiveness of the tools in practical applications. First, I will delve into multi-persistent homology to improve its utility for data analysis while studying its properties such as stability. Second, I want to apply and adapt existing tools of TDA to domains such as biology or music. I am also interested in Deep Learning, thanks to much previous experience in this domain. I would like to see whether TDA's algorithms, such as the computations of barcodes, can be inserted in a Machine Learning pipeline. In particular, how to optimize and learn the filters we until now choose in an arbitrary way to compute those barcodes? I believe that Algebraic Topology is an adequate framework to partly demystify the empirically observed properties of neural networks. For instance, I would like to consider the topological evolution of signals when they evolve throughout the layers of neural networks.

Possible collaboration:
- Spotify
- Olivier Pietquin (DeepMind)
- Steve Oudot (INRIA)

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/N509711/1 01/10/2016 30/09/2021
2099941 Studentship EP/N509711/1 01/10/2018 30/09/2021 Jacob Leygonie
 
Description Recall the title 'New insights on Data Science through Topological Data Analysis' of this award. In Topological Data Analysis (TDA), a central notion is that of Persistent Homology (PH). Persistent Homology is a nice descriptor of complex data (e.g. networks or point clouds) that encodes geometric and topological information, and is widely used in data analysis. Theoretical and practical aspects of PH remain partially unknown, and I mainly focus on the following two insights:
- PH': Can we differentiate Persistent Homology? Differentiating is an unavoidable paradigm in optimisation. So this question was motivated by the will to use Persistent Homology in Machine Learning models. Together with collaborators, we developed (i) a theory of differentiability for PH; and (ii) a data analysis model based on PH that can classify graphs. Both papers are under minor reviews in journals.
- PH^{-1}: Can we say something about the pre-image of PH? In other words, when any two objects have the same Persistent Homology? This is a key problematic for understanding in which practical scenario PH is an interesting descriptor of data. This project is almost finished.
Exploitation Route From a practical perspective, our work enables the data scientist to further incorporate tools from TDA into their predictive models, especially when dealing with complicated data structure such as graphs, point clouds and simplicial complexes.

From a theoretical perspective, our analysis of the differentiability and pre-image of Persistent Homology deepens our understanding of this descriptor of data, and opens interesting and challenging questions.
Sectors Other

URL https://arxiv.org/abs/1910.00960;https://arxiv.org/abs/2101.05201