Missing Data in Population Neuroimaging Studies

Lead Research Organisation: University of Oxford

Abstract

Resources like the UK Biobank are transforming the types of analyses that can be carried out with neuroimaging data. prospective study of disease. In addition to brain imaging data on over 40,000 individuals, there are baseline follow-up measurements on 1000's of variables, but almost all with missing data. When analyses are restricted to complete data the sample size can fall dramatically and biases can be induced. However, standard biostatistical methods for missing data are not scalable to 1000's of variables; while there are scalable missing data methods, in particular matrix completion methods, these make simplistic assumptions about the missing data mechanism (typically, that missingness occurs totally at random) and do not account for increased uncertainty associated with imputed values. In this work the student will evaluate all existing methods that can scale to biobank-sized analyses, considering computational efficiency, bias in estimates and bias in standard errors for each method. They will then develop new methods, in particular building on recent work that uses a neural network to account for informative missing data mechanisms. They will investigate how these methods can be integrated into deep learning methods, but, also, how they can be adapted to be used with conventional epidemiological analyses requiring an interpretable regression model. This work is novel and addresses pressing needs to scale brain imaging and epidemiological methods to the size of biobanks, where 6- and even 7-digit sample sizes must be accommodated. This project spans multiple EPSRC research areas, including engineering, healthcare technologies, information and communication technologies (ICT), and mathematical sciences.

Planned Impact

In the same way that bioinformatics has transformed genomic research and clinical practice, health data science will have a dramatic and lasting impact upon the broader fields of medical research, population health, and healthcare delivery. The beneficiaries of the proposed training programme, and of the research that it delivers and enables, will include academia, industry, healthcare, and the broader UK economy.

Academia: Graduates of the training programme will be well placed to start their post-doctoral careers in leading academic institutions, engaging in high-impact multi-disciplinary research, helping to build training and research capacity, sharing their experience within the wider academic community.

Industry: Partner organisations will benefit from close collaboration with leading researchers, from the joint exploration of research priorities, and from the commercialisation of arising intellectual property. Other organisations will benefit from the availability of highly-qualified graduates with skills in big health data analytics.

Healthcare: Healthcare organisations and patients will benefit from the results of enabled and accelerated health research, leading to new treatments and technologies, and an improved ability to identify and evaluate potential improvements in practice through the analysis of real-world health data.

Economy: The life sciences sector is a key component of the UK economy. The programme will provide partner companies with direct access to leading-edge research. Graduates of the programme will be well-qualified to contribute to economic growth - supporting health research and the development of new products and services - and will be able to inform policy and decision making at organisational, regional, and national levels.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/S02428X/1 01/04/2019 30/09/2027
2594554 Studentship EP/S02428X/1 01/10/2021 30/09/2025 Lav Radosavljevic