Statistical inference for high-dimensional data

Lead Research Organisation: University of Cambridge

Department Name: Pure Maths and Mathematical Statistics

Abstract

High-dimensional data, where the number of variables is large compared to the number of observations, are becoming increasingly common across a range of scientific disciplines and industry. Classical statistical methods often perform poorly for such data, or do not work at all. There is therefore a need to develop new methods that are able to cope well with the high-dimensionality of the data.

Great strides have been made in this direction over the last couple of decades, and we now have well-established methods for providing point estimates of high-dimensional parameters in a variety of common statistical models. Whilst this represents great progress, much remains to be done. One of the key goals of statistical methodology is uncertainty quantification, and there is a need to develop high-dimensional analogues of classical statistical tools for forming confidence intervals and performing hypothesis tests, for example. In addition, principled approaches for how to present quantification of uncertainty are needed; one may wish to quantify uncertainty about thousands of parameters simultaneously and presenting this in an interpretable fashion is a real challenge.

This project will aim to extend the high-dimensional toolbox in several important ways by i) developing new goodness-of-fit tests for high-dimensional models; ii) understanding the robustness of high-dimensional methods to model misspecification; iii) developing methods for detecting and handling outliers in high-dimensional data; iv) creating principles approaches for how to present uncertainty quantification in high-dimensional settings.

Given the ubiquity of high-dimensional data, such fundamental methodology would have broad applicability, for example to genomics where high-dimensional data is very much the norm.

Student:

Harvey Klyne

Period of Study:

Oct 19 - Jun 23

Funder:

EPSRC

Project Status:

Closed

Project Category:

Studentship

Project Reference:

2261074

Research Topic:

Unclassified

Organisations

University of Cambridge (Lead Research Organisation)

People	ORCID iD
Rajen Shah (Primary Supervisor)
Harvey Klyne (Student)

Publications

Author Name Title Publication

Date Published

10 25 50

Klyne H (2023) Semiparametric Methods for Two Problems in Causal Inference using Machine Learning

Studentship Projects

Project Reference	Relationship	Related To	Start	End	Student Name
EP/N509620/1			01/10/2016	30/09/2022
2261074	Studentship	EP/N509620/1	01/10/2019	30/06/2023	Harvey Klyne
EP/R513180/1			01/10/2018	30/09/2023
2261074	Studentship	EP/R513180/1	01/10/2019	30/06/2023	Harvey Klyne

Abstract

Organisations

People

ORCID iD

Publications

Studentship Projects