A Multimodal Topological Approach to Brain Cancer: Unifying Molecular and Imaging Data

Lead Research Organisation: Imperial College London
Department Name: Mathematics

Abstract

Glioblastoma (GBM) is one of the most common and aggressive types of malignant brain tumour found in adults with a dismal rate of survival of on average only fifteen months after diagnosis and only three to five percent of GBM patients survive longer than three years. Due to the high heterogeneity of the disease, GBM is resistant to traditional treatments and prone to recurrence. Moreover, owing to the short life expectancies of GBM patients, the data available is sparse.
In this project, we study two different modes of GBM data, imaging data in the form of histopathology images and molecular data involving the expression of certain characteristic genes for GBM. We look to capture both types of data under the unifying framework of topological data analysis (TDA), a novel branch of data science using mathematical tools from algebraic topology, with statistical inference methods to help us better understand and prognosticate GBM. Better prognostication can help clinicians improve treatments and the quality of life for patients.
In particular, we use persistent homology, an important topological invariant within TDA, to summarize the shapes and sizes of topological features which persist across multiple scales within the data. We benefit from the flexible nature of persistent homology as its computations can be adapted to data of vastly different forms, allowing us to study both imaging and molecular data on the same basis, whilst also capturing features both on a local and global scale. For example, applying persistent homology on histology images, we aim to capture the intercellular structure of the tumours through their nuclei distributions which may include some of the hallmarks of GBM such as necrosis and hypercellularity which differ from normal cell arrangements. All features will be encapsulated within a functional summary. As such, for inference, we seek to construct functional models building on statistical techniques within functional data analysis and benefitting from tools from classical functional analysis, which may help us to deal with problems relating to both the high dimensionality of the data and the small sample size available.
This project falls within the EPSRC mathematical biology research area. We will be working with the brain tumour data collected within the Imperial College Healthcare NHS Trust Neuro-Oncology Service at Charing Cross Hospital.

Planned Impact

The primary CDT impact will be training 75 PhD graduates as the next generation of leaders in statistics and statistical machine learning. These graduates will lead in industry, government, health care, and academic research. They will bridge the gap between academia and industry, resulting in significant knowledge transfer to both established and start-up companies. Because this cohort will also learn to mentor other researchers, the CDT will ultimately address a UK-wide skills gap. The students will also be crucial in keeping the UK at the forefront of methodological research in statistics and machine learning.
After graduating, students will act as multipliers, educating others in advanced methodology throughout their career. There are a range of further impacts:
- The CDT has a large number of high calibre external partners in government, health care, industry and science. These partnerships will catalyse immediate knowledge transfer, bringing cutting edge methodology to a large number of areas. Knowledge transfer will also be achieved through internships/placements of our students with users of statistics and machine learning.
- Our Women in Mathematics and Statistics summer programme is aimed at students who could go on to apply for a PhD. This programme will inspire the next generation of statisticians and also provide excellent leadership training for the CDT students.
- The students will develop new methodology and theory in the domains of statistics and statistical machine learning. It will be relevant research, addressing the key questions behind real world problems. The research will be published in the best possible statistics journals and machine learning conferences and will be made available online. To maximize reproducibility and replicability, source code and replication files will be made available as open source software or, when relevant to an industrial collaboration, held as a patent or software copyright.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/S023151/1 01/04/2019 30/09/2027
2602756 Studentship EP/S023151/1 02/10/2021 30/08/2025 Quiquan Wang