Theoretical foundations of inference in the presence of large numbers of nuisance parameters
Lead Research Organisation:
Imperial College London
Department Name: Mathematics
Abstract
Any method of measurement should be calibrated or at least not highly miscalibrated. Statistical theory ensures such calibration for methods of inference, crucial tools for applied statisticians and scientists, thus making such procedures suitable for purpose. Specifically, in hypothetical repeated application, the proposed methods should give an answer within a small window of the truth.
The present research is about calibrated inference for key quantities of interest, like the effect of a drug or treatment, in the presence of so called nuisance parameters. These are aspects of no direct concern or scientific relevance, but that are needed to complete the idealized representation of the physical, biological or sociological system. Large numbers of them arise naturally when one wishes to limit the strength of modelling assumptions in the equations describing the data generating process.
On a national level, improved understanding of the scientific or societal truths underpinning the data we observe allows significant long term economic benefits. For instance, it allows costly medical screening or government regulation to be better targeted, and allows secure conclusions to be obtained from, say, studies into the efficacy of new drugs, treatment programs or vaccines.
The present research is about calibrated inference for key quantities of interest, like the effect of a drug or treatment, in the presence of so called nuisance parameters. These are aspects of no direct concern or scientific relevance, but that are needed to complete the idealized representation of the physical, biological or sociological system. Large numbers of them arise naturally when one wishes to limit the strength of modelling assumptions in the equations describing the data generating process.
On a national level, improved understanding of the scientific or societal truths underpinning the data we observe allows significant long term economic benefits. For instance, it allows costly medical screening or government regulation to be better targeted, and allows secure conclusions to be obtained from, say, studies into the efficacy of new drugs, treatment programs or vaccines.
Planned Impact
The emphasis of this proposal is understanding, scientific understanding of the data generating mechanisms. This goal is distinct from much of machine learning and other types of data science, whose focus is typically prediction. Important though that is, in scientific research, the objectives are almost always understanding, from which causal predictions are an incidental product.
If the objectives of this proposal are achieved, it would be a significant theoretical and methodological breakthrough, putting UK statistics at the forefront and providing trustworthy methods for the discovery of new truth in the scientific and other domains. The impact of scientific discovery on society and the economy cannot be over emphasized.
If the objectives of this proposal are achieved, it would be a significant theoretical and methodological breakthrough, putting UK statistics at the forefront and providing trustworthy methods for the discovery of new truth in the scientific and other domains. The impact of scientific discovery on society and the economy cannot be over emphasized.
People |
ORCID iD |
| Heather Battey (Principal Investigator / Fellow) |
Publications
Battey H
(2023)
On inference in high-dimensional regression
in Journal of the Royal Statistical Society Series B: Statistical Methodology
Battey H
(2023)
Inducement of population sparsity
in Canadian Journal of Statistics
Battey H
(2024)
Heather Battey's contribution to the Discussion of 'Parameterizing and simulating from causal models' by Evans and Didelez
in Journal of the Royal Statistical Society Series B: Statistical Methodology
Battey H
(2024)
Maximal co-ancillarity and maximal co-sufficiency
in Information Geometry
Battey H
(2024)
An anomaly arising in the analysis of processes with more than one source of variability
in Biometrika
Battey H
(2021)
A note on the analytic approximation of exceedance probabilities in heterogeneous populations
in Statistics & Probability Letters
Battey H
(2022)
Some aspects of non-standard multivariate analysis
in Journal of Multivariate Analysis
Battey H
(2022)
Some Perspectives on Inference in High Dimensions
in Statistical Science
Battey H
(2022)
Heather Battey's Contribution to the Discussion of 'Assumption-Lean Inference for Generalised Linear Model Parameters' by Vansteelandt and Dukes
in Journal of the Royal Statistical Society Series B: Statistical Methodology
| Description | I have developed a framework for inference in sparse high-dimensional regression models, of the kind that arise particularly in genomics. This avoids unnatural assumptions that have been necessary for alternative approaches and respects established principles of inference from low-dimensional settings. The work is embedded within, and provides refinement to, the broader inferential framework of confidence sets of models, advocated in earlier work (detailed under EP/P002757/1). I have proposed confidence "distributions" of models with theoretical guarantees. I have elucidated theoretical properties of confidence sets of models in hypothetical repeated use, as well as a variable reduction procedure (now called Cox Reduction) based on partially balanced incomplete block designs, first proposed by Cox and Battey (2017). I have proposed an approach for assessing the effect of missing covariate data in a regression setting, as an alternative to imputation procedures in common use. The latter approach produces a single answer without warning, regardless of how sensitive the conclusions may be to aspects that were not observed. I have detailed a systematic approach to finding transformations of the data, when they exist, that make the associated likelihood function factorise in such a way as to eliminate nuisance parameters. An open challenge is to generalise the approach to situations when the ideal inferential separations cannot be achieved exactly. This would make the ideas more generally applicable. I have unified several of the ideas above, and ones covered in earlier work (see EP/P002757/1) under the notion of inducement of population-level sparsity, with a view to achieving reconciliation between low-dimensional Fisherian foundations and modern high-dimensional problems. I have proposed developed theory and methodology for inference on the intensity function of a spatial point process on a Riemannian manifold, a problem of direct scientific relevance in many scientific applications, particularly cell biology. I have elucidated inferential anomalies arising in the analysis of processes with more than one source of variability, a situation that arises commonly in experimental contexts involving elaborate forms of blocking, or in models for longitudinal or clustered data. I have proposed a procedure for inference on the coefficients of high-dimensional logistic regression models when the data are linearly separable. The theoretical discussion associated with this work provides considerable insight on the role of sparsity and the merits and limitations of conditional inference in the presence of linearly separable data. I have provided insight into the structure of models and their parametrisations under which standard approaches produce reliable inference for an interest parameter (e.g. one quantifying a treatment effect) in spite of arbitrary misspecification of the nuisance component of the model. I have provided new perspectives on the foundations of statistical inference, which may serve to provide conceptual understanding. I have provided research supervision to three PhD students and two postdoctoral researchers who are working on projects related to (EP/T01864X/1). |
| Exploitation Route | Software accompanies all methodological research associated with the award, and can be downloaded from the journal website or other publicly accessible webpages for general use. The work has also identified numerous avenues for future work, some of which are presented as lists of open problems at the end of the relevant papers. |
| Sectors | Chemicals Environment Healthcare Pharmaceuticals and Medical Biotechnology |
| Description | The techniques developed in a paper with E. A. K. Cohen and S. Ward have been used by biochemists and cell biologists to understand how the outer membrane in bacteria assembles. This is an important step towards understanding antibiotic resistance. See: S. Kumar, P. Inns, S. Ward, V. Lagage, J. Wang, R. Kaminska, S. Uphoff, E. A. K. Cohen, G. Mamou and C. Kleanthous. Immobile lipopolysaccharides and outer membrane proteins differentially segregate in growing Escherichia coli. Proceedings of the National Academy of Sciences, In Press, 2025. |
| First Year Of Impact | 2025 |
| Sector | Healthcare,Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology |
| Impact Types | Societal Economic |
| Description | Delivery of lectures to PhD students across several London universities via the London Taught Course Centre (2020, 2022, 2023) |
| Geographic Reach | Local/Municipal/Regional |
| Policy Influence Type | Influenced training of practitioners or researchers |
| Impact | The content of these lectures is not readily available and not from any single source. I am not aware of similar courses taught elsewhere in the UK. Cumulatively over the years in which they ran, approximately 80 students have sat in the lectures and submitted coursework demonstrating their understanding of the new material. |
| Description | Wrote and delivered 30 hours of undergraduate/MSc lectures at Imperial College London |
| Geographic Reach | Multiple continents/international |
| Policy Influence Type | Influenced training of practitioners or researchers |
| Impact | Students who have followed this material have received unique and extensive training, enabling them to directly contribute to scientific and societal understanding in their future careers. Because statistics is mathematical in basis, improved training of students in this field has the potential for impact across diverse fields. The student cohort is international, with a large proportion of British students. |
| Title | An approach to estimation of the intensity function of a spatial point process on a Riemannian manifold. |
| Description | The paper: Ward, S., Battey, H. S. and Cohen, E.A.K. (2023). Nonparametric estimation of the intensity function of a spatial point process on a Riemannian manifold. Biometrika, 110, 1009-1021, and the associated software implementation, provides a means of estimating a key parameter of a spatial point process constrained to the surface of a manifold. Such processes arise particularly in cellular biology and microbiology, where super-resolution microscopy techniques can record the spatial arrangement of proteins and other molecules of interest on the cellular membranes of cells, bacteria and other microorganisms. |
| Type Of Material | Data analysis technique |
| Year Produced | 2023 |
| Provided To Others? | Yes |
| Impact | In the microbiological examples that motivated the work, knowledge of the intensity functions of, say, two different molecular processes, can aid scientific understanding by suggesting possible dependencies between the processes, perhaps to be probed more formally. Alternatively the estimates might be used as outcomes, blocking factors or concomitant variables in an experimental setting concerned with assessing the efficacy of one or more treatments. Dr. Edward Cohen has presented the work at multidisciplinary scientific venues, and our impression is that the work is being used to aid scientific discovery in cell biology and 3D bio-imaging applications. |
| Title | Calibrated confidence intervals for coefficient parameters and refined confidence sets of models in high-dimensional regression |
| Description | The paper: Battey, H. S. and Reid, N. (2023). On inference in high-dimensional regression. J. R. Statist. Soc. Ser. B, 85, 149-175, along with the associated Matlab implementation, provides a way of constructing calibrated confidence intervals for regression parameters in high-dimensional linear models. These intervals were used in the same paper to refine the confidence sets of models proposed in earlier work (Cox and Battey, 2017; Battey and Cox, 2018). |
| Type Of Material | Data analysis technique |
| Year Produced | 2022 |
| Provided To Others? | Yes |
| Impact | Too early to say. |
| Title | Fractional factorial assessment of the effects of missing covariate data in regression |
| Description | The paper: Battey, H. S. and Cox, D. R. (2023). Missing observations in regression: a conditional approach. Royal Society Open Science, 10, article number 220267 and the associated Matlab implementation provides a means of assessing the effect of missing covariate observations on each of the regression coefficients. It is an alternative to multiple imputation, a very popular and widely used procedure for dealing with missing data. |
| Type Of Material | Data analysis technique |
| Year Produced | 2023 |
| Provided To Others? | Yes |
| Impact | Too early to say. |
| URL | https://datadryad.org/stash/dataset/doi:10.5061/dryad.2rbnzs7rw |
| Description | Collaboration with Prof. Karthik Bharath (University of Nottingham) on sparsity-inducing reparametrisations |
| Organisation | University of Nottingham |
| Department | School of Mathematics Nottingham |
| Country | United Kingdom |
| Sector | Academic/University |
| PI Contribution | Both parties provided expertise relevant for deducing parametrisations under which the relevant models are sparse. One of my PhD students worked on the project. |
| Collaborator Contribution | Karthik Bharath provided expertise in differential geometry relevant to the problem. |
| Impact | A preprint, pending peer review, can be found on arXiv: . |
| Start Year | 2022 |
| Description | Collaboration with Prof. Nancy Reid (University of Toronto) on the structure of inference in misspecified models |
| Organisation | University of Toronto |
| Country | Canada |
| Sector | Academic/University |
| PI Contribution | Both partners provided expertise relevant to elucidating the structure of models that allows consistent estimation of interest parameters when the nuisance component is misspecified. |
| Collaborator Contribution | Both partners provided expertise relevant to elucidating the structure of models that allows consistent estimation of interest parameters when the nuisance component is misspecified. |
| Impact | A preprint, pending review at the time of writing (March 2024), can be found on arXiv . |
| Start Year | 2023 |
| Description | Collaboration with Prof. Peter McCullagh (University of Chicago) on inferential anomalies associated with the use of the Wald statistic |
| Organisation | University of Chicago |
| Country | United States |
| Sector | Academic/University |
| PI Contribution | Both parties provided expertise relevant for understanding the inferential anomalies associated with the use of the Wald statistic in processes involving more than one source of variability. |
| Collaborator Contribution | Both parties provided expertise relevant for understanding the inferential anomalies associated with the use of the Wald statistic in processes involving more than one source of variability. |
| Impact | A paper has been accepted for publication in Biometrika. |
| Start Year | 2022 |
| Description | Collaboration with Prof. Sir David Cox (University of Oxford) on the foundations of inference in the presence of nuisance parameters |
| Organisation | University of Oxford |
| Country | United Kingdom |
| Sector | Academic/University |
| PI Contribution | Both parties provided expertise on the theoretical foundations of inference. |
| Collaborator Contribution | Both parties provided expertise on the theoretical foundations of inference. |
| Impact | Two publications that emerged from this aspect of the collaboration are Battey, H. S., Cox, D. R. and Lee, S. (2024) On partial likelihood and the construction of factorisable transformations. Information Geometry, 7, 9-28. Battey, H. S. and Cox, D. R. (2022) Some perspectives on inference in high dimensions. Statistical Science, 37, 110-122. A tangentially related paper is Battey, H. S. and Cox, D. R. (2023) Missing observations in regression: a conditional approach. Royal Society Open Science, 10, article number 220267. |
| Start Year | 2020 |
| Description | Collaboration with Sciteb Ltd on a mathematical characterisation of the dynamics and instabilities of markets (2020-2024) |
| Organisation | Sciteb |
| Country | United Kingdom |
| Sector | Private |
| PI Contribution | I contributed an appendix on statistical properties of an estimator needed to make the research operational. |
| Collaborator Contribution | Nicholas Beale of Sciteb Ltd. proposed the problem, motivated by real-world experience. Mr. Kutlwano Bashe and Prof. Robert Mackay of the University of Warwick provided an analysis of the dynamics. |
| Impact | This collaboration was multi-disciplinary across statistics, dynamical systems and economics. A research paper was published in the Journal of Dynamics and Games. |
| Start Year | 2020 |
| Description | Mathematics Department representative for the Centre for High-Throughput Digital Electronics and Machine Learning |
| Organisation | Imperial College London |
| Department | Department of Physics |
| Country | United Kingdom |
| Sector | Academic/University |
| PI Contribution | Expertise; intellectual input. |
| Collaborator Contribution | Expertise; intellectual input; access to data, equipment and facilities. |
| Impact | Multidisciplinary (mathematics, statistics, physics, chemistry, medicine) |
| Start Year | 2021 |
| Description | Member of the Research Board of DigiFab: Institute of Digital Molecular Design and Fabrication |
| Organisation | Imperial College London |
| Department | Department of Chemistry |
| Country | United Kingdom |
| Sector | Academic/University |
| PI Contribution | Expertise; intellectual input; research supervision. |
| Collaborator Contribution | Expertise; intellectual input; access to data; access to data, equipment and facilities. |
| Impact | Multidisciplinary (mathematics, statistics, chemistry). No tangible outputs yet. |
| Start Year | 2021 |
| Title | Fractional factorial assessment of the effects of missingness in regression |
| Description | The Dryad data and source code repository contains the data (in .csv and .mat format) and source code (.m file compatible with MATLAB) to reproduce the analysis of the bone marrow data in Section 4 of Cox and Battey (2023). The source code replaces missing entries by combinations of high and low values according to a fractional factorial structure for the estimation of the main effects of missingness and fits a logistic regression model using each of the resulting sets of factorially-completed covariate data. Estimated regression coefficients and their standard errors are stored, and the effects of missingness presented as in Tables 2 and 3 of the paper. |
| Type Of Technology | Software |
| Year Produced | 2023 |
| Open Source License? | Yes |
| Impact | Too early to say |
| URL | https://royalsocietypublishing.org/doi/full/10.1098/rsos.220267 |
| Title | Source code to construct confidence sets of models in high-dimensional regression |
| Description | The software (produced by R. M. Lewis, a doctoral student at Imperial College London) extends original code written Heather to produce confidence sets of models in sparse high-dimensional regression settings. It implements extensions of Battey and Cox (2017, 2018) discussed in Lewis and Battey (2023). |
| Type Of Technology | Software |
| Year Produced | 2023 |
| Open Source License? | Yes |
| Impact | Too early to say. |
| Description | "Inspirational Lecture" for undergraduate students at Imperial College London |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | Local |
| Primary Audience | Undergraduate students |
| Results and Impact | The aim was to introduce undergraduate students to a rewarding and intellectually challenging area of research and to encourage further study. |
| Year(s) Of Engagement Activity | 2020 |
| Description | Appointed to the Research Board of the Institute of Digital Molecular Design and Fabrication, Department of Chemistry, Imperial College London |
| Form Of Engagement Activity | A formal working group, expert panel or dialogue |
| Part Of Official Scheme? | No |
| Geographic Reach | Local |
| Primary Audience | Professional Practitioners |
| Results and Impact | My involvement has led to discussions and plans for joint research supervision on the use of experimental design and statistical analyses in the development and discovery of medicines, agrochemicals and polymers. |
| Year(s) Of Engagement Activity | 2021,2022 |
| URL | https://www.imperial.ac.uk/digital-molecular-design-and-fabrication/ |
| Description | Lecture for the Piscopia Initiative |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | National |
| Primary Audience | Undergraduate students |
| Results and Impact | From the Piscopia Initiative website: "Despite the fact that 40% of UK graduates in the mathematical sciences are female, only 6% of them go on to be professors [LMS report, 2013]. In October 2019, we (PhD students at the University of Edinburgh) founded the Piscopia Initiative to tackle the participation crisis of women and non-binary people in mathematics research in the UK. We aim to encourage women and non-binary students to pursue a PhD in mathematics. We offer both UK-wide and university-specific events at 13 UK universities through our local Piscopia committees. These are all aimed at undergraduate/MSc students in mathematics and related disciplines". |
| Year(s) Of Engagement Activity | 2021 |
| URL | https://piscopia.co.uk/past-events/ |
| Description | Lecture to the Warwick Mathematics Society |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | Regional |
| Primary Audience | Undergraduate students |
| Results and Impact | The aim was to introduce undergraduate students to a rewarding and intellectually challenging area of research and to encourage further study. |
| Year(s) Of Engagement Activity | 2021 |
| Description | School visit (Finchley, London) |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | Regional |
| Primary Audience | Schools |
| Results and Impact | The talk was for prospective sixth-form pupils of Woodhouse College. The sixth form specialises in mathematics and sciences and the visit was specifically targeted at female students. My talk highlighted the diverse areas to which statistical ideas apply, and the role of mathematics in statistical training. |
| Year(s) Of Engagement Activity | 2023 |