📣 Help Shape the Future of UKRI's Gateway to Research (GtR)

We're improving UKRI's Gateway to Research and are seeking your input! If you would be interested in being interviewed about the improvements we're making and to have your say about how we can make GtR more user-friendly, impactful, and effective for the Research and Innovation community, please email gateway@ukri.org.

Theoretical foundations of inference in the presence of large numbers of nuisance parameters

Lead Research Organisation: Imperial College London
Department Name: Mathematics

Abstract

Any method of measurement should be calibrated or at least not highly miscalibrated. Statistical theory ensures such calibration for methods of inference, crucial tools for applied statisticians and scientists, thus making such procedures suitable for purpose. Specifically, in hypothetical repeated application, the proposed methods should give an answer within a small window of the truth.

The present research is about calibrated inference for key quantities of interest, like the effect of a drug or treatment, in the presence of so called nuisance parameters. These are aspects of no direct concern or scientific relevance, but that are needed to complete the idealized representation of the physical, biological or sociological system. Large numbers of them arise naturally when one wishes to limit the strength of modelling assumptions in the equations describing the data generating process.

On a national level, improved understanding of the scientific or societal truths underpinning the data we observe allows significant long term economic benefits. For instance, it allows costly medical screening or government regulation to be better targeted, and allows secure conclusions to be obtained from, say, studies into the efficacy of new drugs, treatment programs or vaccines.

Planned Impact

The emphasis of this proposal is understanding, scientific understanding of the data generating mechanisms. This goal is distinct from much of machine learning and other types of data science, whose focus is typically prediction. Important though that is, in scientific research, the objectives are almost always understanding, from which causal predictions are an incidental product.

If the objectives of this proposal are achieved, it would be a significant theoretical and methodological breakthrough, putting UK statistics at the forefront and providing trustworthy methods for the discovery of new truth in the scientific and other domains. The impact of scientific discovery on society and the economy cannot be over emphasized.

Publications

10 25 50
publication icon
Battey H (2023) On inference in high-dimensional regression in Journal of the Royal Statistical Society Series B: Statistical Methodology

publication icon
Battey H (2023) Inducement of population sparsity in Canadian Journal of Statistics

publication icon
Battey H (2024) Heather Battey's contribution to the Discussion of 'Parameterizing and simulating from causal models' by Evans and Didelez in Journal of the Royal Statistical Society Series B: Statistical Methodology

publication icon
Battey H (2024) Maximal co-ancillarity and maximal co-sufficiency in Information Geometry

publication icon
Battey H (2022) Some aspects of non-standard multivariate analysis in Journal of Multivariate Analysis

publication icon
Battey H (2022) Some Perspectives on Inference in High Dimensions in Statistical Science

publication icon
Battey H (2022) Heather Battey's Contribution to the Discussion of 'Assumption-Lean Inference for Generalised Linear Model Parameters' by Vansteelandt and Dukes in Journal of the Royal Statistical Society Series B: Statistical Methodology

 
Description I have developed a framework for inference in sparse high-dimensional regression models, of the kind that arise particularly in genomics. This avoids unnatural assumptions that have been necessary for alternative approaches and respects established principles of inference from low-dimensional settings. The work is embedded within, and provides refinement to, the broader inferential framework of confidence sets of models, advocated in earlier work (detailed under EP/P002757/1). I have proposed confidence "distributions" of models with theoretical guarantees.

I have elucidated theoretical properties of confidence sets of models in hypothetical repeated use, as well as a variable reduction procedure (now called Cox Reduction) based on partially balanced incomplete block designs, first proposed by Cox and Battey (2017).

I have proposed an approach for assessing the effect of missing covariate data in a regression setting, as an alternative to imputation procedures in common use. The latter approach produces a single answer without warning, regardless of how sensitive the conclusions may be to aspects that were not observed.

I have detailed a systematic approach to finding transformations of the data, when they exist, that make the associated likelihood function factorise in such a way as to eliminate nuisance parameters. An open challenge is to generalise the approach to situations when the ideal inferential separations cannot be achieved exactly. This would make the ideas more generally applicable.

I have unified several of the ideas above, and ones covered in earlier work (see EP/P002757/1) under the notion of inducement of population-level sparsity, with a view to achieving reconciliation between low-dimensional Fisherian foundations and modern high-dimensional problems.

I have proposed developed theory and methodology for inference on the intensity function of a spatial point process on a Riemannian manifold, a problem of direct scientific relevance in many scientific applications, particularly cell biology.

I have elucidated inferential anomalies arising in the analysis of processes with more than one source of variability, a situation that arises commonly in experimental contexts involving elaborate forms of blocking, or in models for longitudinal or clustered data.

I have proposed a procedure for inference on the coefficients of high-dimensional logistic regression models when the data are linearly separable. The theoretical discussion associated with this work provides considerable insight on the role of sparsity and the merits and limitations of conditional inference in the presence of linearly separable data.

I have provided insight into the structure of models and their parametrisations under which standard approaches produce reliable inference for an interest parameter (e.g. one quantifying a treatment effect) in spite of arbitrary misspecification of the nuisance component of the model.

I have provided new perspectives on the foundations of statistical inference, which may serve to provide conceptual understanding.

I have provided research supervision to three PhD students and two postdoctoral researchers who are working on projects related to (EP/T01864X/1).
Exploitation Route Software accompanies all methodological research associated with the award, and can be downloaded from the journal website or other publicly accessible webpages for general use. The work has also identified numerous avenues for future work, some of which are presented as lists of open problems at the end of the relevant papers.
Sectors Chemicals

Environment

Healthcare

Pharmaceuticals and Medical Biotechnology

 
Description The techniques developed in a paper with E. A. K. Cohen and S. Ward have been used by biochemists and cell biologists to understand how the outer membrane in bacteria assembles. This is an important step towards understanding antibiotic resistance. See: S. Kumar, P. Inns, S. Ward, V. Lagage, J. Wang, R. Kaminska, S. Uphoff, E. A. K. Cohen, G. Mamou and C. Kleanthous. Immobile lipopolysaccharides and outer membrane proteins differentially segregate in growing Escherichia coli. Proceedings of the National Academy of Sciences, In Press, 2025.
First Year Of Impact 2025
Sector Healthcare,Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology
Impact Types Societal

Economic

 
Description Delivery of lectures to PhD students across several London universities via the London Taught Course Centre (2020, 2022, 2023)
Geographic Reach Local/Municipal/Regional 
Policy Influence Type Influenced training of practitioners or researchers
Impact The content of these lectures is not readily available and not from any single source. I am not aware of similar courses taught elsewhere in the UK. Cumulatively over the years in which they ran, approximately 80 students have sat in the lectures and submitted coursework demonstrating their understanding of the new material.
 
Description Wrote and delivered 30 hours of undergraduate/MSc lectures at Imperial College London
Geographic Reach Multiple continents/international 
Policy Influence Type Influenced training of practitioners or researchers
Impact Students who have followed this material have received unique and extensive training, enabling them to directly contribute to scientific and societal understanding in their future careers. Because statistics is mathematical in basis, improved training of students in this field has the potential for impact across diverse fields. The student cohort is international, with a large proportion of British students.
 
Title An approach to estimation of the intensity function of a spatial point process on a Riemannian manifold. 
Description The paper: Ward, S., Battey, H. S. and Cohen, E.A.K. (2023). Nonparametric estimation of the intensity function of a spatial point process on a Riemannian manifold. Biometrika, 110, 1009-1021, and the associated software implementation, provides a means of estimating a key parameter of a spatial point process constrained to the surface of a manifold. Such processes arise particularly in cellular biology and microbiology, where super-resolution microscopy techniques can record the spatial arrangement of proteins and other molecules of interest on the cellular membranes of cells, bacteria and other microorganisms. 
Type Of Material Data analysis technique 
Year Produced 2023 
Provided To Others? Yes  
Impact In the microbiological examples that motivated the work, knowledge of the intensity functions of, say, two different molecular processes, can aid scientific understanding by suggesting possible dependencies between the processes, perhaps to be probed more formally. Alternatively the estimates might be used as outcomes, blocking factors or concomitant variables in an experimental setting concerned with assessing the efficacy of one or more treatments. Dr. Edward Cohen has presented the work at multidisciplinary scientific venues, and our impression is that the work is being used to aid scientific discovery in cell biology and 3D bio-imaging applications. 
 
Title Calibrated confidence intervals for coefficient parameters and refined confidence sets of models in high-dimensional regression 
Description The paper: Battey, H. S. and Reid, N. (2023). On inference in high-dimensional regression. J. R. Statist. Soc. Ser. B, 85, 149-175, along with the associated Matlab implementation, provides a way of constructing calibrated confidence intervals for regression parameters in high-dimensional linear models. These intervals were used in the same paper to refine the confidence sets of models proposed in earlier work (Cox and Battey, 2017; Battey and Cox, 2018). 
Type Of Material Data analysis technique 
Year Produced 2022 
Provided To Others? Yes  
Impact Too early to say. 
 
Title Fractional factorial assessment of the effects of missing covariate data in regression 
Description The paper: Battey, H. S. and Cox, D. R. (2023). Missing observations in regression: a conditional approach. Royal Society Open Science, 10, article number 220267 and the associated Matlab implementation provides a means of assessing the effect of missing covariate observations on each of the regression coefficients. It is an alternative to multiple imputation, a very popular and widely used procedure for dealing with missing data. 
Type Of Material Data analysis technique 
Year Produced 2023 
Provided To Others? Yes  
Impact Too early to say. 
URL https://datadryad.org/stash/dataset/doi:10.5061/dryad.2rbnzs7rw
 
Description Collaboration with Prof. Karthik Bharath (University of Nottingham) on sparsity-inducing reparametrisations 
Organisation University of Nottingham
Department School of Mathematics Nottingham
Country United Kingdom 
Sector Academic/University 
PI Contribution Both parties provided expertise relevant for deducing parametrisations under which the relevant models are sparse. One of my PhD students worked on the project.
Collaborator Contribution Karthik Bharath provided expertise in differential geometry relevant to the problem.
Impact A preprint, pending peer review, can be found on arXiv: .
Start Year 2022
 
Description Collaboration with Prof. Nancy Reid (University of Toronto) on the structure of inference in misspecified models 
Organisation University of Toronto
Country Canada 
Sector Academic/University 
PI Contribution Both partners provided expertise relevant to elucidating the structure of models that allows consistent estimation of interest parameters when the nuisance component is misspecified.
Collaborator Contribution Both partners provided expertise relevant to elucidating the structure of models that allows consistent estimation of interest parameters when the nuisance component is misspecified.
Impact A preprint, pending review at the time of writing (March 2024), can be found on arXiv .
Start Year 2023
 
Description Collaboration with Prof. Peter McCullagh (University of Chicago) on inferential anomalies associated with the use of the Wald statistic 
Organisation University of Chicago
Country United States 
Sector Academic/University 
PI Contribution Both parties provided expertise relevant for understanding the inferential anomalies associated with the use of the Wald statistic in processes involving more than one source of variability.
Collaborator Contribution Both parties provided expertise relevant for understanding the inferential anomalies associated with the use of the Wald statistic in processes involving more than one source of variability.
Impact A paper has been accepted for publication in Biometrika.
Start Year 2022
 
Description Collaboration with Prof. Sir David Cox (University of Oxford) on the foundations of inference in the presence of nuisance parameters 
Organisation University of Oxford
Country United Kingdom 
Sector Academic/University 
PI Contribution Both parties provided expertise on the theoretical foundations of inference.
Collaborator Contribution Both parties provided expertise on the theoretical foundations of inference.
Impact Two publications that emerged from this aspect of the collaboration are Battey, H. S., Cox, D. R. and Lee, S. (2024) On partial likelihood and the construction of factorisable transformations. Information Geometry, 7, 9-28. Battey, H. S. and Cox, D. R. (2022) Some perspectives on inference in high dimensions. Statistical Science, 37, 110-122. A tangentially related paper is Battey, H. S. and Cox, D. R. (2023) Missing observations in regression: a conditional approach. Royal Society Open Science, 10, article number 220267.
Start Year 2020
 
Description Collaboration with Sciteb Ltd on a mathematical characterisation of the dynamics and instabilities of markets (2020-2024) 
Organisation Sciteb
Country United Kingdom 
Sector Private 
PI Contribution I contributed an appendix on statistical properties of an estimator needed to make the research operational.
Collaborator Contribution Nicholas Beale of Sciteb Ltd. proposed the problem, motivated by real-world experience. Mr. Kutlwano Bashe and Prof. Robert Mackay of the University of Warwick provided an analysis of the dynamics.
Impact This collaboration was multi-disciplinary across statistics, dynamical systems and economics. A research paper was published in the Journal of Dynamics and Games.
Start Year 2020
 
Description Mathematics Department representative for the Centre for High-Throughput Digital Electronics and Machine Learning 
Organisation Imperial College London
Department Department of Physics
Country United Kingdom 
Sector Academic/University 
PI Contribution Expertise; intellectual input.
Collaborator Contribution Expertise; intellectual input; access to data, equipment and facilities.
Impact Multidisciplinary (mathematics, statistics, physics, chemistry, medicine)
Start Year 2021
 
Description Member of the Research Board of DigiFab: Institute of Digital Molecular Design and Fabrication 
Organisation Imperial College London
Department Department of Chemistry
Country United Kingdom 
Sector Academic/University 
PI Contribution Expertise; intellectual input; research supervision.
Collaborator Contribution Expertise; intellectual input; access to data; access to data, equipment and facilities.
Impact Multidisciplinary (mathematics, statistics, chemistry). No tangible outputs yet.
Start Year 2021
 
Title Fractional factorial assessment of the effects of missingness in regression 
Description The Dryad data and source code repository contains the data (in .csv and .mat format) and source code (.m file compatible with MATLAB) to reproduce the analysis of the bone marrow data in Section 4 of Cox and Battey (2023). The source code replaces missing entries by combinations of high and low values according to a fractional factorial structure for the estimation of the main effects of missingness and fits a logistic regression model using each of the resulting sets of factorially-completed covariate data. Estimated regression coefficients and their standard errors are stored, and the effects of missingness presented as in Tables 2 and 3 of the paper. 
Type Of Technology Software 
Year Produced 2023 
Open Source License? Yes  
Impact Too early to say 
URL https://royalsocietypublishing.org/doi/full/10.1098/rsos.220267
 
Title Source code to construct confidence sets of models in high-dimensional regression 
Description The software (produced by R. M. Lewis, a doctoral student at Imperial College London) extends original code written Heather to produce confidence sets of models in sparse high-dimensional regression settings. It implements extensions of Battey and Cox (2017, 2018) discussed in Lewis and Battey (2023). 
Type Of Technology Software 
Year Produced 2023 
Open Source License? Yes  
Impact Too early to say. 
 
Description "Inspirational Lecture" for undergraduate students at Imperial College London 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Undergraduate students
Results and Impact The aim was to introduce undergraduate students to a rewarding and intellectually challenging area of research and to encourage further study.
Year(s) Of Engagement Activity 2020
 
Description Appointed to the Research Board of the Institute of Digital Molecular Design and Fabrication, Department of Chemistry, Imperial College London 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Professional Practitioners
Results and Impact My involvement has led to discussions and plans for joint research supervision on the use of experimental design and statistical analyses in the development and discovery of medicines, agrochemicals and polymers.
Year(s) Of Engagement Activity 2021,2022
URL https://www.imperial.ac.uk/digital-molecular-design-and-fabrication/
 
Description Lecture for the Piscopia Initiative 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Undergraduate students
Results and Impact From the Piscopia Initiative website: "Despite the fact that 40% of UK graduates in the mathematical sciences are female, only 6% of them go on to be professors [LMS report, 2013]. In October 2019, we (PhD students at the University of Edinburgh) founded the Piscopia Initiative to tackle the participation crisis of women and non-binary people in mathematics research in the UK. We aim to encourage women and non-binary students to pursue a PhD in mathematics. We offer both UK-wide and university-specific events at 13 UK universities through our local Piscopia committees. These are all aimed at undergraduate/MSc students in mathematics and related disciplines".
Year(s) Of Engagement Activity 2021
URL https://piscopia.co.uk/past-events/
 
Description Lecture to the Warwick Mathematics Society 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Undergraduate students
Results and Impact The aim was to introduce undergraduate students to a rewarding and intellectually challenging area of research and to encourage further study.
Year(s) Of Engagement Activity 2021
 
Description School visit (Finchley, London) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Schools
Results and Impact The talk was for prospective sixth-form pupils of Woodhouse College. The sixth form specialises in mathematics and sciences and the visit was specifically targeted at female students. My talk highlighted the diverse areas to which statistical ideas apply, and the role of mathematics in statistical training.
Year(s) Of Engagement Activity 2023