Topological data analysis for spatial transcriptomics

Lead Research Organisation: University of Oxford
Department Name: Mathematical Institute

Abstract

Topological data analysis (TDA) is a relatively new field of mathematics which uses ideas and techniques from algebraic topology to study the underlying geometry of data. Tools in TDA typically feature strong robustness to noise and are readily applicable to high-dimensional inputs, making them particularly suitable for use in a wide variety of biological applications.

One challenge which arises frequently in this context is the incorporation of spatial information. Spatial data sets are often noisy, and existing techniques can be misled by outliers and mislabelling. However, recent work on the statistical frameworks underpinning TDA has allowed for promising new topological applications in this direction. For example, multiparameter persistence landscapes [1] have been successfully applied to study spatial tissue data in a way which is both statistically sound and unhindered by the presence of artefacts [2].

Single-cell sequencing is a particularly rich source of complex biological information, producing gene expression data for large numbers of individual cells. Modern technology allows for hundreds of genes to be read simultaneously, producing very high-dimensional, noisy data sets. TDA is therefore a natural choice for study in this area. Indeed, the scTDA methodology uses techniques from TDA to perform unsupervised temporal transcriptomic analysis, and outperforms several existing methodologies on synthetic data [3]. However, scTDA does not account for the spatial distribution of cells, and it is generally understood that this is a critical factor in understanding tissue behaviour [4].

Several techniques now exist for incorporating spatial information with transcriptomic analysis, including SpatialDE [5], trendsceek [6], and SPARK [7]. However, they are generally characterised by a focus on expression gradients rather than identification of cell boundaries. This makes them unsuitable for contexts in which the detection of sparse cells is important. Further, these methods fail to account for a wide range of additional available information, such as transcript density and nuclear localisation.

Building on the success of scTDA, and given the demonstrated potential of topological methods for analysis of spatial data, we propose that TDA would provide an ideal framework for the analysis of spatial transcriptomic data. We will produce novel topological methodologies which address the issues present in existing techniques. To enable this work, we will make use of the cutting-edge Stereo-Seq technology, which has enabled the collection of vast amounts of spatial transcriptomic information at unprecedented resolutions [8].

This project falls within the EPSRC Mathematical Biology, Biological Informatics, and Geometry and Topology research areas.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/T517811/1 01/10/2020 30/09/2025
2580662 Studentship EP/T517811/1 01/10/2021 31/03/2025