Statistical inference for high-dimensional data
Lead Research Organisation:
University of Cambridge
Department Name: Pure Maths and Mathematical Statistics
Abstract
High-dimensional data, where the number of variables is large compared to the number of observations, are becoming increasingly common across a range of scientific disciplines and industry. Classical statistical methods often perform poorly for such data, or do not work at all. There is therefore a need to develop new methods that are able to cope well with the high-dimensionality of the data.
Great strides have been made in this direction over the last couple of decades, and we now have well-established methods for providing point estimates of high-dimensional parameters in a variety of common statistical models. Whilst this represents great progress, much remains to be done. One of the key goals of statistical methodology is uncertainty quantification, and there is a need to develop high-dimensional analogues of classical statistical tools for forming confidence intervals and performing hypothesis tests, for example. In addition, principled approaches for how to present quantification of uncertainty are needed; one may wish to quantify uncertainty about thousands of parameters simultaneously and presenting this in an interpretable fashion is a real challenge.
This project will aim to extend the high-dimensional toolbox in several important ways by i) developing new goodness-of-fit tests for high-dimensional models; ii) understanding the robustness of high-dimensional methods to model misspecification; iii) developing methods for detecting and handling outliers in high-dimensional data; iv) creating principles approaches for how to present uncertainty quantification in high-dimensional settings.
Given the ubiquity of high-dimensional data, such fundamental methodology would have broad applicability, for example to genomics where high-dimensional data is very much the norm.
Great strides have been made in this direction over the last couple of decades, and we now have well-established methods for providing point estimates of high-dimensional parameters in a variety of common statistical models. Whilst this represents great progress, much remains to be done. One of the key goals of statistical methodology is uncertainty quantification, and there is a need to develop high-dimensional analogues of classical statistical tools for forming confidence intervals and performing hypothesis tests, for example. In addition, principled approaches for how to present quantification of uncertainty are needed; one may wish to quantify uncertainty about thousands of parameters simultaneously and presenting this in an interpretable fashion is a real challenge.
This project will aim to extend the high-dimensional toolbox in several important ways by i) developing new goodness-of-fit tests for high-dimensional models; ii) understanding the robustness of high-dimensional methods to model misspecification; iii) developing methods for detecting and handling outliers in high-dimensional data; iv) creating principles approaches for how to present uncertainty quantification in high-dimensional settings.
Given the ubiquity of high-dimensional data, such fundamental methodology would have broad applicability, for example to genomics where high-dimensional data is very much the norm.
Organisations
People |
ORCID iD |
Rajen Shah (Primary Supervisor) | |
Harvey Klyne (Student) |
Publications
Studentship Projects
Project Reference | Relationship | Related To | Start | End | Student Name |
---|---|---|---|---|---|
EP/N509620/1 | 01/10/2016 | 30/09/2022 | |||
2261074 | Studentship | EP/N509620/1 | 01/10/2019 | 30/06/2023 | Harvey Klyne |
EP/R513180/1 | 01/10/2018 | 30/09/2023 | |||
2261074 | Studentship | EP/R513180/1 | 01/10/2019 | 30/06/2023 | Harvey Klyne |