Statistical methods for ovarian cancer diagnosis and prognosis

Lead Research Organisation: Imperial College London
Department Name: Mathematics

Abstract

The development of high-throughput sequencing technologies has led to the production of large-scale profiling data; allowing us to gain insight into underlying biological processes. Available at different levels, sequencing allows us to collect data about DNA, RNA, proteins, metabolites and so forth, providing complementary information when characterising a biological object. Individually each source of data, referred to as omics data, characterises a specific part of an organism. For instance, genomics based at the DNA level characterises the genome, whilst transcriptomics, based at the RNA level characterises the transcriptome. Notably, each level is related to one another, for instance, mRNA is translated to proteins, driving the behaviour of cells thereby leading to the expression of phenotypes. Due to the high-dimensionality and heterogeneity within omics datasets, the analysis is ripe with statistical challenges.

Throughout my PhD I will be working on novel statistical methods tackling the issues of dimensionality reduction and variable selection for omics datasets, both in supervised and unsupervised settings. One such method, currently under development, is a high-dimensional Bayesian survival analysis model that uses a spike-and-slab prior. Our method enables us to perform variable selection in a high-dimensional setting, whilst also offering mechanisms for uncertainty quantification and effect estimation. Within the biomedical sciences, survival analysis is a task of key importance, and when performed with transcriptomics data enables the creation of prognostic model and the discovery of biomarkers.

A second aspect of my PhD will focus on the development of methodology for data-integration. Where data-integration involves the joint analysis of multiple datasets with the goal of understanding the relationships between them. Motivated by our collaborators at Imperial's CRUK centre, we will be applying these methods to radiomics data (image features constructed from medical images), and other omics datasets collected from patients with ovarian cancer. Thereby, providing biological interpretations to affordable and easy to collect (CT/MRI) scans. Currently, we are considering extending the probabilistic framing of canonical correlation analysis. Such extensions will enable these methods to work in a high-dimensional setting and simultaneously provide uncertainty quantification.

Aligning with EPSRC strategies in artificial intelligence and healthcare, the proposed methodological developments seek to improve health services by optimising patient treatments. Ultimately, the focus of my PhD is based on data from patients with ovarian cancer, however the general applicability of biologically relevant methods extends beyond single disease.

Planned Impact

The primary CDT impact will be training 75 PhD graduates as the next generation of leaders in statistics and statistical machine learning. These graduates will lead in industry, government, health care, and academic research. They will bridge the gap between academia and industry, resulting in significant knowledge transfer to both established and start-up companies. Because this cohort will also learn to mentor other researchers, the CDT will ultimately address a UK-wide skills gap. The students will also be crucial in keeping the UK at the forefront of methodological research in statistics and machine learning.
After graduating, students will act as multipliers, educating others in advanced methodology throughout their career. There are a range of further impacts:
- The CDT has a large number of high calibre external partners in government, health care, industry and science. These partnerships will catalyse immediate knowledge transfer, bringing cutting edge methodology to a large number of areas. Knowledge transfer will also be achieved through internships/placements of our students with users of statistics and machine learning.
- Our Women in Mathematics and Statistics summer programme is aimed at students who could go on to apply for a PhD. This programme will inspire the next generation of statisticians and also provide excellent leadership training for the CDT students.
- The students will develop new methodology and theory in the domains of statistics and statistical machine learning. It will be relevant research, addressing the key questions behind real world problems. The research will be published in the best possible statistics journals and machine learning conferences and will be made available online. To maximize reproducibility and replicability, source code and replication files will be made available as open source software or, when relevant to an industrial collaboration, held as a patent or software copyright.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/S023151/1 01/04/2019 30/09/2027
2605902 Studentship EP/S023151/1 03/10/2020 30/09/2024 Michael Komodromos