Theoretical foundations of inference in the presence of large numbers of nuisance parameters

Lead Research Organisation: Imperial College London
Department Name: Mathematics

Abstract

Any method of measurement should be calibrated or at least not highly miscalibrated. Statistical theory ensures such calibration for methods of inference, crucial tools for applied statisticians and scientists, thus making such procedures suitable for purpose. Specifically, in hypothetical repeated application, the proposed methods should give an answer within a small window of the truth.

The present research is about calibrated inference for key quantities of interest, like the effect of a drug or treatment, in the presence of so called nuisance parameters. These are aspects of no direct concern or scientific relevance, but that are needed to complete the idealized representation of the physical, biological or sociological system. Large numbers of them arise naturally when one wishes to limit the strength of modelling assumptions in the equations describing the data generating process.

On a national level, improved understanding of the scientific or societal truths underpinning the data we observe allows significant long term economic benefits. For instance, it allows costly medical screening or government regulation to be better targeted, and allows secure conclusions to be obtained from, say, studies into the efficacy of new drugs, treatment programs or vaccines.

Planned Impact

The emphasis of this proposal is understanding, scientific understanding of the data generating mechanisms. This goal is distinct from much of machine learning and other types of data science, whose focus is typically prediction. Important though that is, in scientific research, the objectives are almost always understanding, from which causal predictions are an incidental product.

If the objectives of this proposal are achieved, it would be a significant theoretical and methodological breakthrough, putting UK statistics at the forefront and providing trustworthy methods for the discovery of new truth in the scientific and other domains. The impact of scientific discovery on society and the economy cannot be over emphasized.

Publications

10 25 50

publication icon
Battey H (2022) Heather Battey's Contribution to the Discussion of 'Assumption-Lean Inference for Generalised Linear Model Parameters' by Vansteelandt and Dukes in Journal of the Royal Statistical Society Series B: Statistical Methodology

publication icon
Battey H (2023) Inducement of population sparsity in Canadian Journal of Statistics

publication icon
Battey H (2022) Some aspects of non-standard multivariate analysis in Journal of Multivariate Analysis

publication icon
Battey H (2023) On inference in high-dimensional regression in Journal of the Royal Statistical Society Series B: Statistical Methodology

publication icon
Battey H (2022) Some Perspectives on Inference in High Dimensions in Statistical Science

publication icon
Battey H (2023) D. R. Cox: Extracts From a Memorial Lecture in Harvard Data Science Review

 
Description I have developed a framework for inference in sparse high-dimensional regression models. This avoids unnatural assumptions that have been necessary for alternative approaches and respects established principles of inference from low-dimensional settings. The work is embedded in a broader inferential framework and provides refinements to confidence sets of models, a concept advocated in earlier work (detailed under EP/P002757/1). I have proposed confidence "distributions" of models with theoretical guarantees (terminology slightly inaccurate but aligned with established concepts).

I have elucidated theoretical properties of confidence sets of models in hypothetical repeated use, as well as a variable reduction procedure (now called Cox Reduction) based on partially balanced incomplete block designs, first proposed by Cox and Battey (2017).

I have proposed an approach for assessing the effect of missing covariate data in a regression setting, as an alternative to imputation procedures in common use. The latter approach produces a single answer without warning, regardless of how sensitive the conclusions may be to aspects that were not observed.

I have detailed a systematic approach to finding transformations of the data, when they exist, that make the associated likelihood function factorise in such a way as to eliminate nuisance parameters. An open challenge is to generalise the approach to situations when the ideal inferential separations cannot be achieved exactly. This would make the ideas more generally applicable.

I have unified several of the ideas above, and ones covered in earlier work (see EP/P002757/1) under the notion of inducement of population-level sparsity, with a view to achieving reconciliation between low-dimensional Fisherian foundations and modern high-dimensional problems.

I have provided research supervision to three PhD students and one postdoctoral researcher who are working on projects related to (EP/T01864X/1).
Exploitation Route Software accompanies all methodological research associated with the award, and can be downloaded from the journal website or other publicly accessible webpages for general use. The work has also identified numerous avenues for future work, some of which are presented as lists of open problems at the end of the relevant papers.
Sectors Chemicals,Environment,Healthcare,Pharmaceuticals and Medical Biotechnology

 
Description Delivery of lectures to PhD students across several London universities via the London Taught Course Centre (2020, 2022, 2023)
Geographic Reach Local/Municipal/Regional 
Policy Influence Type Influenced training of practitioners or researchers
Impact The content of these lectures is not readily available and not from any single source. I am not aware of similar courses taught elsewhere in the UK. Cumulatively over the years in which they ran, approximately 80 students have sat in the lectures and submitted coursework demonstrating their understanding of the new material.
 
Description Mathematics Department representative for the Centre for High-Throughput Digital Electronics and Machine Learning 
Organisation Imperial College London
Department Department of Physics
Country United Kingdom 
Sector Academic/University 
PI Contribution Expertise; intellectual input.
Collaborator Contribution Expertise; intellectual input; access to data, equipment and facilities.
Impact Multidisciplinary (mathematics, statistics, physics, chemistry, medicine)
Start Year 2021
 
Description Member of the Research Board of DigiFab: Institute of Digital Molecular Design and Fabrication 
Organisation Imperial College London
Department Department of Chemistry
Country United Kingdom 
Sector Academic/University 
PI Contribution Expertise; intellectual input; research supervision.
Collaborator Contribution Expertise; intellectual input; access to data; access to data, equipment and facilities.
Impact Multidisciplinary (mathematics, statistics, chemistry). No tangible outputs yet.
Start Year 2021
 
Title Fractional factorial assessment of the effects of missingness in regression 
Description The Dryad data and source code repository contains the data (in .csv and .mat format) and source code (.m file compatible with MATLAB) to reproduce the analysis of the bone marrow data in Section 4 of Cox and Battey (2023). The source code replaces missing entries by combinations of high and low values according to a fractional factorial structure for the estimation of the main effects of missingness and fits a logistic regression model using each of the resulting sets of factorially-completed covariate data. Estimated regression coefficients and their standard errors are stored, and the effects of missingness presented as in Tables 2 and 3 of the paper. 
Type Of Technology Software 
Year Produced 2023 
Open Source License? Yes  
Impact Too early to say 
URL https://royalsocietypublishing.org/doi/full/10.1098/rsos.220267
 
Title Source code to construct confidence sets of models in high-dimensional regression 
Description The software (produced by R. M. Lewis, a doctoral student at Imperial College London) extends original code written Heather to produce confidence sets of models in sparse high-dimensional regression settings. It implements extensions of Battey and Cox (2017, 2018) discussed in Lewis and Battey (2023). 
Type Of Technology Software 
Year Produced 2023 
Open Source License? Yes  
Impact Too early to say. 
 
Description "Inspirational Lecture" for undergraduate students at Imperial College London 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Undergraduate students
Results and Impact The aim was to introduce undergraduate students to a rewarding and intellectually challenging area of research and to encourage further study.
Year(s) Of Engagement Activity 2020
 
Description Appointed to the Research Board of the Institute of Digital Molecular Design and Fabrication, Department of Chemistry, Imperial College London 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Professional Practitioners
Results and Impact My involvement has led to discussions and plans for joint research supervision on the use of experimental design and statistical analyses in the development and discovery of medicines, agrochemicals and polymers.
Year(s) Of Engagement Activity 2021,2022
URL https://www.imperial.ac.uk/digital-molecular-design-and-fabrication/
 
Description Lecture for the Piscopia Initiative 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Undergraduate students
Results and Impact From the Piscopia Initiative website: "Despite the fact that 40% of UK graduates in the mathematical sciences are female, only 6% of them go on to be professors [LMS report, 2013]. In October 2019, we (PhD students at the University of Edinburgh) founded the Piscopia Initiative to tackle the participation crisis of women and non-binary people in mathematics research in the UK. We aim to encourage women and non-binary students to pursue a PhD in mathematics. We offer both UK-wide and university-specific events at 13 UK universities through our local Piscopia committees. These are all aimed at undergraduate/MSc students in mathematics and related disciplines".
Year(s) Of Engagement Activity 2021
URL https://piscopia.co.uk/past-events/
 
Description Lecture to the Warwick Mathematics Society 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Undergraduate students
Results and Impact The aim was to introduce undergraduate students to a rewarding and intellectually challenging area of research and to encourage further study.
Year(s) Of Engagement Activity 2021