Scalable Bayesian Statistical Machine Learning Methods for the Analysis of Neurodegenerative Diseases

Lead Research Organisation: University of Oxford
Department Name: Statistics

Abstract

The motivation for my research is rooted within the area of multiple sclerosis which is a neurodegenerative disease that causes damages to the protective sheath around axons in the brain. This leads to a reduction of the ability of different parts of the central nervous system to communicate. Therefore, causing various motor, sensory, visual and autonomic deficits due to the episodic or progressive nature of the autoimmune disease multiple sclerosis. The disease is generally diagnosed and managed through magnetic resonance imaging (MRI) due to the cost effective and readily available nature of MRI scanners in a clinical setting. The analysis of multiple sclerosis lesions, which is the damage of the white matter within the brain, is further possible through binary lesions masks created manually by radiologists or model-based by researchers.

My research in particular focuses on the analysis of multiple sclerosis lesions identified in MRI images with the aim of correlating information contained in the binary lesion maps with clinical deficits or other predictors, such as gender, age, disease duration or clinical subtype of multiple sclerosis. Particularly, my project enables the inclusion of spatial dependence structures within the MRI images of the brain among each of the covariates included in the model. The novelty of the project is the increase in scalability of previously applied research methods. Hence, one of the key objectives is to reduce the computational burden caused by the magnitude of parameters to be estimated within spatially varying models. The approach taken within this project aims to preserve the spatial structure of the MRI images by not relying on data reduction techniques and other approximations as far as possible.

Therefore, the Bayesian spatial generalised linear mixed model, proposed by Ge et al. (2014), which includes a spatial dependence structure via spatially varying coefficients for every location within a MRI image, enables the estimation of lesion probabilities at each location within the brain based on a set of input covariates. The goal of the project is to achieve computational efficiency via optimisation techniques to update parameters in lieu of relying on slow simulation-based estimation techniques, such as Markov Chain Monte Carlo methods. Furthermore, the objective is to extend this approach to include variable selection for the purpose of lowering the computational cost as well as interpreting the significance of the identified spatial locations with respect to their influence on the lesion intensity within the brain caused by the disease. A further modelling aspect taken into account within my research is also a longitudinal extension which is computationally intractable in the current literature surrounding spatial regression models with the amount of parameters that are prevalent in neuroimaging applications.

This project falls within the EPSRC mathematical science research area. It is supervised by Prof. Thomas Nichols at the Nuffield Department of Population Health of the University of Oxford, in addition to his affiliation with the Big Data Institute in Oxford, and Prof. Chris Holmes at the University of Oxford. The project receives funding from the company Novartis and the ESPRC.

Planned Impact

The primary CDT impact will be training 75 PhD graduates as the next generation of leaders in statistics and statistical machine learning. These graduates will lead in industry, government, health care, and academic research. They will bridge the gap between academia and industry, resulting in significant knowledge transfer to both established and start-up companies. Because this cohort will also learn to mentor other researchers, the CDT will ultimately address a UK-wide skills gap. The students will also be crucial in keeping the UK at the forefront of methodological research in statistics and machine learning.
After graduating, students will act as multipliers, educating others in advanced methodology throughout their career. There are a range of further impacts:
- The CDT has a large number of high calibre external partners in government, health care, industry and science. These partnerships will catalyse immediate knowledge transfer, bringing cutting edge methodology to a large number of areas. Knowledge transfer will also be achieved through internships/placements of our students with users of statistics and machine learning.
- Our Women in Mathematics and Statistics summer programme is aimed at students who could go on to apply for a PhD. This programme will inspire the next generation of statisticians and also provide excellent leadership training for the CDT students.
- The students will develop new methodology and theory in the domains of statistics and statistical machine learning. It will be relevant research, addressing the key questions behind real world problems. The research will be published in the best possible statistics journals and machine learning conferences and will be made available online. To maximize reproducibility and replicability, source code and replication files will be made available as open source software or, when relevant to an industrial collaboration, held as a patent or software copyright.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/S023151/1 01/04/2019 30/09/2027
2247869 Studentship EP/S023151/1 01/10/2019 30/09/2023 Anna Menacher