Methods for longitudinally measured image datasets

Lead Research Organisation: University of Leeds
Department Name: Statistics

Abstract

Large datasets are nowadays everywhere. In medicine examples include imaging and omics datasets; in finance large dataset comprises information about credit card use; another example is twitter data. It is often relevant to identify patterns across datasets, e.g. changes in images over time, or to link datasets with an outcome variable such as knee images and osteoarthritis, or heart images and Cardiovascular Disease, or brain images and Alzheimer. A set of all relevant patterns (relationships) reduces the dimension of the original data and support interpretation of the data.
Types of methods which are often used for dimension reduction, are latent variable approaches such as principal components and partial least squares methods. These latent variables are linear combinations from the original variables and represent the relevant information. A second class of dimension reduction techniques is variable selection. How to combine these two approaches is a topic of current research.
This project is motivated by longitudinally measured images of knees in patient with osteoarthritis. Osteoarthritis is a common disease in the elderly. It is a condition in which the natural cushioning between joints -- cartilage -- wears away. When this happens, the bones of the joints rub more closely against one another with less of the shock-absorbing benefits of cartilage. The rubbing results in pain, swelling, stiffness, and decreased ability to move. Surgery might be needed to replace the knee. There are images for seven time points for almost 5K patients. Detailed descriptions of the data can be found at https://www.niams.nih.gov/grants-funding/funded-research/osteoarthritis-initiative
We will develop statistical and machine learning methods which are able to cluster patients which have similar knee profiles over time (unsupervised clustering) and to predict changes in the images of the knee predicting severe disease outcomes (classification). To identify components which explain most of the covariance between images and an outcome, we will first develop functional principal component analysis (FPCA) methods and functional partial least squares (FPLS) methods for two dimensional images. Secondly, to link the images over time, we will extend the FPCA and FPLS to longitudinal functional principal component analysis (LFPCA) methods and the longitudinal functional partial least squares (LFPLS) methods designed for longitudinally observed images, which summarize the images over time in curves over time. Here the starting point is work on spatial functional principal components for dense images and extend this to longitudinally images (sparse). The obtained curves over time represent the changes in the images and can be used in supervised or unsupervised clustering. Another starting point for modelling longitudinally images is methodology for longitudinally functional data. Thirdly most of the currently available methods for functional data cannot deal with missing data, so the last step will be to propose a solution for datasets where some subjects have missing images.
In this project we will focus on knee images and osteoarthritis. A second topic will be heart images and cardiovascular disease. We will use open source datasets and datasets from clinical collaborators in Leeds. For the latter datasets, we will have the advantage of receiving input from a medical perspective in the interpretation of our results.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/T517860/1 01/10/2020 30/09/2025
2438836 Studentship EP/T517860/1 01/10/2020 31/03/2024 Sonia Dembowska