Application driven Topological Data Analysis

Lead Research Organisation: University of Oxford
Department Name: Mathematical Institute

Abstract

Modern science and technology generates data at an unprecedented rate. A major challenge is that this data is often complex, high dimensional, may include temporal and/or spatial information. The "shape" of the data can be important but it is difficult to extract and quantify it using standard machine learning or statistical techniques. For example, an image of blood vessels near a tumor looks very different than an image of healthy blood
vessels; statistics alone cannot quantify this shape because it is the shape that matters. The focus of this proposal is to study the shape of data, through the development of new mathematics and algorithms, and build on existing data science techniques in order to obtain and interpret the shape of data. A theoretical field of mathematics that enables the study of shapes is topology. The ability to compute the shape (its topology) of complicated shapes is only possible with advanced mathematics and algorithms. The field known as topological data analysis (TDA), enables one to use topology to study the shape of data, such as loops in a blood vessel network. In particular, an algorithm within TDA known as persistent homology, provides a topological summary of the shape of the data (e.g., features such as holes) at multiple scales. A key success of persistent homology is the ability to provide robust results, even if the data are noisy. There are theoretical and computational challenges in the application of these algorithms to large scale, real-world data.

The aim of this project is to build on current persistent homology tools, extending it theoretically, computationally, and adapting it for practical applications. Our core team is composed of experts in pure and applied mathematicians, computer scientists, and statisticians whose combined expertise covers cutting edge pure mathematics, mathematical modeling, algorithm design and data analysis. This core team will work closely with our collaborators in a range of scientific and industrial domains. Some of the application challenges we have set out include:

Can we detect a tumor by looking at the shape of images of blood vessels? Can we design new materials by looking at the shape of molecules using topology? How can we design such molecules? Can we detect anomalies in security data? And importantly, how can we accelerate algorithms to obtain topological characteristics of data in real time?

Planned Impact

IMPACT

The proposed centre involves a number of collaborators, providing some immediate pathways to impact. Each will provide access to appropriate data and will evaluate research and outcomes from the viewpoint of potential exploiters (potential users and investors) rather than only from mathematical and computing perspectives.

(1) GCHQ has pointed us towards relevant data sets within the public domain for security-research.

(2) BSMbench have supplied data sets appropriate to PH, from their own sector, including data previously used to benchmark other analytics methods.

(3) We have access to time series data sets; and are discussing interest in TDA with IFPEN.

We will arrange meetings with commercial data-analytics experts from a variety of commercial companies that have already expressed their need to understand the possibilities that TDA may hold. These include Tesco/Dunnhumby, Lloyds Bank, Morgan Stanley, GSK, HSBC, iProspect, Unilever, BT, and others.

We will work with one SME initially, delivering B2B services, while aiming to grow this further: Kognitio is a specialist in parallel processing.

To build upon the state-of-the-art in domain-specific data analysis techniques, we will also closely collaboration with the following academic end-users to maximise impact and translation of our research framework:

(4) High-resolution, spatio-temporal vascular image data, Dept of Oncology (Oxford)

(5) Databases of hypothetical nano-crystal structures from Materials Innovation Factory (Liverpool) and Laboratory of Molecular Simulations (EPFL)

(6) Data from Monte Carlo simulations of phases of matter, Dept of Physics (Swansea)

Through the co-creation of new theory that is implemented to provide scientific breakthroughs, such advances will impact the mathematical, computational, scientific, and corporate sectors.

The centre will host two large conferences for researchers and an international conference to propagate PH methods across UK HEIs. We will disseminate our research both in generalist journals as well as specialised journals so that the developed framework is widely disseminated. Our goal is to catalyse an active application-driven TDA community within the UK through annual meetings with presentation of progress and papers. We aim to build a strong UK community focussed on research and translation of next generation TDA concepts and technologies.

Publications

10 25 50