Statistical Analysis of Manifold-Valued Data

Lead Research Organisation: University of Nottingham
Department Name: Sch of Mathematical Sciences

Abstract

Summary

Regression methods, interpreted broadly, enable the user to measure dependence of a response variable of interest on a set of covariates, i.e. measurable variables that are expected to affect the response variable. The power of this approach is due to the fact that, given the covariate values, the regression model can be used to predict a likely range of values of the response variable, and to assess which covariates are the main drivers in the behaviour of the response. This project is concerned with types of response variable which have complicated nonlinear structure (in mathematical terminology, the response is manifold-valued). For such data, no general framework for regression modelling exists. An example of the type of response variable that we wish to consider is the shape of an object; shape is a highly nonlinear entity.

There are numerous potential applications of the regression methodology that we will develop, many (but not all) of which are in biology and medicine. For example, within the forseeable future we expect the outputs of our project to assist surgeons in making decisions in the following situation. Suppose a patient has a tumour and the surgeon wishes to decide which type of operation (if any) would be best. A suitable regression model would enable prediction, under each type of operation, of the growth trajectory of the tumour after the operation. Relevant covariate information would include variables such as size-and-shape of the tumour before the operation, location of the tumour, age and gender of the patient. The surgeon would then be able to assess which trajectory, and therefore which type of operation, would be most favourable for the patient.

A second application, this time for neuroscience, relates to diffusion tensor imaging. One output of the project will be methodology for interpolating manifold-valued data in a spatial setting. In the context of diffusion tensor imaging of the brain, spatial interpolation of the diffusion tensor data will provide more accurate maps of the brain which will give improved and more soundly-based interpretations of the white matter fibre structure to help understand brain function.

A third application is in forensic science. The models we develop will allow prediction of the development of the shape of a face, depending on covariate information, such as the shapes of the parents' faces, and other information such as gender and age. This methodology will be useful in child abduction cases for example. While it is certainly the case that methods for extrapolating face shape currently exist, they do not incorporate covariate information in the model.

There are many other research areas in which manifold-valued response data arise naturally and where we expect the project outputs to have a major impact, including plant biology (of relevance, ultimately, to food security) and protein modelling.

The practical problems which highlight generic issues in regression modelling for manifold-valued data have all arisen from our work with collaborators in other fields. Therefore the successful implementation of the novel and exciting ideas in this proposal will provide a framework for addressing not only the problems that motivated this proposal, but also have a major impact on research in many scientific disciplines, in addition to being of methodological and theoretical interest to researchers in statistics, computer science, mathematics and related fields. The proposed research will also add in a substantial way to the available pool of UK expertise and to maintain its position as internationally-leading in the statistical analysis of shape and, more generally, object data.

Planned Impact

Impact Summary

Regression methods, interpreted broadly, enable the user to measure dependence of a response variable of interest on a set of covariates, i.e. measurable variables that are expected to affect the response variable. The power of this approach is due to the fact that, given the covariate values, the regression model can be used to predict a likely range of values of the response variable, and to assess which covariates are the main drivers in the behaviour of the response. This project is concerned with types of response variable which have complicated nonlinear structure (in mathematical terminology, the response is manifold-valued). For such data, no general framework for regression modelling exists. An example of the type of response variable that we wish to consider is the shape of an object; shape is a highly nonlinear entity.

There are numerous potential applications of the regression methodology that we will develop, many (but not all) of which are in biology and medicine. For example, within the forseeable future we expect the outputs of our project to assist surgeons in making decisions in the following situation. Suppose a patient has a tumour and the surgeon wishes to decide which type of operation (if any) would be best. A suitable regression model would enable prediction, under each type of operation, of the growth trajectory of the tumour after the operation. Relevant covariate information would include variables such as size-and-shape of the tumour before the operation, location of the tumour, age and gender of the patient. The surgeon would then be able to assess which trajectory, and therefore which type of operation, would be most favourable for the patient.

A second application, this time for neuroscience, relates to diffusion tensor imaging. One output of the project will be methodology for interpolating manifold-valued data in a spatial setting. In the context of diffusion tensor imaging of the brain, spatial interpolation of the diffusion tensor data will provide more accurate maps of the brain which will give improved and more soundly-based interpretations of the white matter fibre structure to help understand brain function.

A third application is in forensic science. The models we develop will allow prediction of the development of the shape of a face, depending on covariate information, such as the shapes of the parents' faces, and other information such as gender and various types of quantitative information. This methodology will be useful in child abduction cases for example. While it is certainly the case that methods for extrapolating face shape currently exist, they do not currently provide a general framework for incorporating covariate information in the model.

There are many other research areas in which manifold-valued response data arise naturally and where we expect the project outputs to have a major impact, including plant biology (with relevance, ultimately, to food security) and protein modelling.
 
Description There have been a number of interesting and important developments associated with this grant. The work described below has either been submitted for publication or (in most cases) is close to being ready for submission.

1. We have developed novel statistical methodology and software for performing regression modeling on the sphere, i.e. where the response variable is a direction in 3 dimensions or more dimensions and covariate information with general structure is available. Prior to this grant, there was little or no general regression methodology available for modelling response data on the sphere which (i) allows for general covariate structure and (ii) allows for departures from isotropy in the error distribution. Our work in this area remedies this situation and is expected to be useful in many fields of research.

2. Principal nested spheres (PNS) is a dimension reduction technique which is particularly useful for data which lie on manifolds such as spheres and shape spaces. In the manifold setting, PNS has superior properties to more standard dimension-reduction approaches and can provide striking insights that may be missed by more traditional approaches such as principal components analysis. Some fast computational algorithms for implementing PNS have been developed and applied to the analysis of 3D protein simulations for investigating biological function in applications in pharmaceutical sciences.

3. A major focus of the grant has been the development of various regression models for size-and-shape response data where covariate information is available. This is a type of modeling in which very little work has been done previously yet it has important application in areas such as forensic science (facial data) and studies of gait, to name just two. Complementary Bayesian and non-Bayesian approaches have been developed and software for doing the necessary computations will be made available in the near future.

4. Statistical classification of actions in videos may be performed by extracting relevant features, particularly covariance features, from image frames and studying time series associated with temporal evolutions of these features. A natural mathematical representation of activity videos is in the form of parametrised trajectories on the covariance manifold, i.e. a manifold of positive definite symmetric matrices. A novel statistical framework, which is invariant to transformations of the parameter, has been developed and has been applied to visual speech recognition and hand-gesture recognition. This statistical framework has great promise in many other application areas.

5 A method for curving fitting in 3D Kendall shape spaces, based on unrolling and unwrapping, has been developed. A key insight is that parallel transport along a geodesic on Kendall shape space is linked to the solution of a homogeneous first-order differential equation, some of whose coefficients are implicitly defined functions. Numerical solution of this differential equation enables approximate unrolling and unwrapping which allows the numerical calculation of smoothing splines for 3-dimensional shape data. This approach has been applied to the statistical analysis of moving peptide data.
Exploitation Route Dryden (2018) has provided an R package called Shapes. This is version 1.2.4 released on August 18, 2018. See https://www.maths.nottingham.ac.uk/plp/pmzild/shapes/
This package will enable a wide range of users to exploit the developments arising out of this grant.
Sectors Agriculture

Food and Drink

Healthcare

Pharmaceuticals and Medical Biotechnology

 
Description Our work on modelling and describing cell shape in seeds has been used by collaborators in Plant Sciences led by Dr George Bassell (University of Birmingham) and has resulted in a paper which has been published in Plant Cell. Tsagris (former Research Fellow on EP/K022547/1) and Wood are co-authors. This paper has now been accepted and published online, and will appear in a journal issue in the near future.
Sector Agriculture, Food and Drink,Healthcare
Impact Types Economic

 
Description Workshop (Nottingham) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact The purposes of the activity were to disseminate our research to both experts in, and potential users of, methodology, research developed in the grant; to interact with leading international experts on problems of mutual interest; to provide PhD students and postdocs with the opportunity to learn about this important research area; and to develop an international network in this area. The workshop was a great success and has stimulated further activity in this area. Participants who were at the workshop are organising a workshop at Oberwolfach, Germany, covering the same research area, to take place in January 2018.
Year(s) Of Engagement Activity 2016
URL http://notts-manifold-stats-workshop.weebly.com/