Principled inference for functionals of large structured covariance matrices.

Lead Research Organisation: Imperial College London
Department Name: Mathematics

Abstract

Statistics plays a fundamental role in daily life, allowing costly medical screening, drug development, marketing campaigns or government regulation to be better targeted through improved understanding of the scientific or societal truths underpinning the data we observe. More and more frequently, the scientific truths we wish to learn correspond to a high dimensional parameter. This project considers covariance matrices and related quantities such as inverse covariance matrices, which are particularly important types of high dimensional parameter, arising in numerous statistical applications. When the dimensionality of the covariance matrix is larger than the number of available data points, structure (sparsity in some domain) must be assumed in order to obtain estimates that are well behaved statistically. This project explores new types of structure for covariance and inverse covariance matrix estimation. Some of these structures facilitate uncertainty statements about the true high dimensional parameter rather than simply providing a point estimate. They also allow different estimates to be aggregated without losing statistical accuracy.

Planned Impact

The research of this proposal aims to institute a fundamental shift in the cognitive framework underpinning the area of large covariance estimation, which, until now, has only considered point estimators and not set estimators. Large covariance and inverse covariance estimation is an indispensable ingredient to almost every modern statistical procedure, yet the current methodological toolkit is currently insufficiently rich to respond to the demands of practitioners (see Section 2 of the Case for Support document). It also contributes to the area of distributed inference for large scale data. Whilst being widely employed by practitioners, distributed inference is poorly understood theoretically. The research thus contributes to the advancement of statistics as a discipline.

On a national level, improved understanding of the scientific or societal truths underpinning the data we observe allows significant long term economic benefits. For instance, it allows costly medical screening, drug development, marketing campaigns or government regulation to be better targeted.

A direct beneficiary of the research is the Imperial College funded PhD student, who will have the opportunity to work on projects at the heart of modern statistics researched internationally. The organisation who later employs them and the individuals who later work with them will benefit from their expertise in this important area. More generally, the Statistics Section at Imperial College will benefit from exposure to seminar speakers and conference speakers who would not normally have attended (see Justification of Resources document).

Publications

10 25 50

publication icon
Battey HS (2018) Large numbers of explanatory variables: a probabilistic assessment. in Proceedings. Mathematical, physical, and engineering sciences

publication icon
Battey HS (2019) On the linear in probability model for binary data. in Royal Society open science

publication icon
Beale N (2020) An unethical optimization principle. in Royal Society open science

publication icon
Cox DR (2017) Large numbers of explanatory variables, a semi-descriptive analysis. in Proceedings of the National Academy of Sciences of the United States of America

 
Description I have proposed and provided theoretical justification for confidence sets of models (as distinct from confidence regions for the parameters of a given model) in sparse high-dimensional regression problems. This is particularly relevant for the statistical analysis of genomics data. The R package implementing these ideas has been dowloaded nearly 15000 times between July 2018 and February 2021.

I have sought relevant embeddings for covariance matrices where the interesting models are sparse and have explored statistical aspects associated with exploiting sparsity in these new domains. This work has opened up new research directions.

Motivated by sociological applications in which linear representations of probabilities are used to model binary responses, I elucidated some of the properties of this model, together with approaches to performing inference on the associated regression coefficients. This work also proposed an approach for assessing sensitivity to missing observations.

I have provided insights and approaches for reliable inference on interest parameters, such as treatment or exposure effects, when a large number of nuisance parameters are present in a probabilistic model.
Exploitation Route Sparse regression models are widely used to analyse genomics data. The previous state of the art methods of analysis, while effective for prediction, are usually misleading if understanding is the goal. Confidence sets of models (Cox and Battey, 2017; Battey and Cox, 2018) specify the reasonable scientific explanations that are compatible with the data and are thus more suitable when understanding is sought. The software and detailed guide to usage facilitate this.
Sectors Healthcare,Pharmaceuticals and Medical Biotechnology

 
Description Work from to two published papers produced under this award (Cox and Battey, 2017; Battey and Cox, 2018) has been used by practitioners including at the National Institute of Health, Radiation Epidemiology Branch and at the Nuffield Department of Population Health, University of Oxford. The associated R package `HCmodelSets', a translation of the original source code, was downloaded 24350 times between July 2018 and March 2023. The number of downloads far exceeds the number of citations in academic journals, perhaps indicative of impact beyond academia. At the time of writing (March 2023), citations to the two papers appear in several journals outside of the statistics and applied probability domain in which the grant was awarded, namely: - International Journal of Epidemiology - The American Journal of Clinical Nutrition - Behaviormetrika - Gene Citations to another published paper under this award (Battey, Cox and Jackson, 2019) also appear in journals from outside of my field. Some of the citing articles relate to the Covid-19 pandemic. At the time of writing (March 2023) the journal names are: - Applied Research in Quality of Life - Arts Education Policy Review - Environmental Health - Frontiers in Psychiatry - Global Public Health - International Journal of Environmental Research and Public Health - Journal of Affective Disorders Reports - Journal of Economic Behaviour and Organization - Journal of Environmental Research and Public Health - Journal of Family and Economic Issues - Journal of Occupational Health - Journal of Transport Geography - Preventative Medicine - Psychological Methods - Reproductive Medicine and Biology - Society of Labor Economists -- Annual Conference - Social Science Research - Sociology A third paper from the award (Beale, Battey, Davison, MacKay, 2020) is acquiring citations in the areas of artificial intelligence and ethics. The work was covered in several scientific media outlets: - Eurekalert.org: , - Alpha Galileo: - Sciencenewsnet.in: - Phys.org: - IFL science: - News wise: - True Viral News: - Electronics weekly: - NCYT: - SYFY: - MedIndia Network for Health: - Analytics India Magazine: - Biometric Update: - Eurasia Review - Science Bulletin: - Good News Network: - Singularity Hub:
First Year Of Impact 2018
Sector Creative Economy,Digital/Communication/Information Technologies (including Software),Environment,Healthcare,Pharmaceuticals and Medical Biotechnology,Transport,Other
Impact Types Societal

 
Description London Taught Course Centre lecture course
Geographic Reach Local/Municipal/Regional 
Policy Influence Type Influenced training of practitioners or researchers
Impact Any tangible impacts are not known.
 
Description Theoretical foundations of inference in the presence of large numbers of nuisance parameters
Amount £792,229 (GBP)
Funding ID EP/T01864X/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 10/2020 
End 09/2025
 
Description Associate Member of Nuffield College, University of Oxford 
Organisation University of Oxford
Department Nuffield College
Country United Kingdom 
Sector Academic/University 
PI Contribution Research collaboration with Prof. Sir David Cox.
Collaborator Contribution Expertise; intellectual input.
Impact Multiple research papers detailed elsewhere in the form. Dissemination of research by both parties.
Start Year 2017
 
Description Collaboration with Sciteb Ltd on the ethical implications of artificial intelligence. 
Organisation Sciteb
Country United Kingdom 
Sector Private 
PI Contribution Expertise; intellectual input.
Collaborator Contribution Expertise; intellectual input; broad dissemination beyond academic circles.
Impact Publication in a Royal Society journal. Details elsewhere in this form.
Start Year 2019
 
Title HCmodelSets 
Description An R package for constructing well-fitting models in regression with a large number of potential explanatory variables. 
Type Of Technology Software 
Year Produced 2018 
Open Source License? Yes  
Impact The software has been downloaded almost 15000 times between July 2018 and February 2021. 
URL https://cran.r-project.org/web/packages/HCmodelSets/index.html
 
Title Large numbers of explanatory variables, a semi-descriptive analysis. 
Description Matlab code for Large numbers of explanatory variables, a semi-descriptive analysis. 
Type Of Technology Software 
Year Produced 2017 
Open Source License? Yes  
Impact Allows the implementation of the method described by Cox, DR and Battey, HS, Proceedings of the National Academy of Sciences, 114 (32), 8592-8595. 
 
Description A talk at the research and careers showcase for Imperial College undergraduate students 
Form Of Engagement Activity Participation in an open day or visit at my research institution
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Undergraduate students
Results and Impact Approximately 40 undergraduate students attended and asked questions about pursuing a research career in mathematics.
Year(s) Of Engagement Activity 2017
 
Description Article for Imperial College Tech Digest 
Form Of Engagement Activity A magazine, newsletter or online publication
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Industry/Business
Results and Impact My article highlighted possible pitfalls of automated decision making, big data and machine learning.
Year(s) Of Engagement Activity 2020
 
Description Mathematics for Medicine mini symposium 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Professional Practitioners
Results and Impact The intended purpose was to make medical scientists aware of relevant research developments in the mathematics department and vice versa, with an emphasis on open mathematical or statistical problems in medical research.
Year(s) Of Engagement Activity 2018
URL https://lms.mrc.ac.uk/mixing-maths-and-medical-science/
 
Description Presentation at the National Institute of Health 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Third sector organisations
Results and Impact Work was presented at the National Institute of Health, Washington DC.
Year(s) Of Engagement Activity 2018
URL https://dceg.cancer.gov/news-events/events/2018/cox-seminar-statistics
 
Description Presentation to experimental physicists 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Presentation explaining inferential strategies suitable for problems with large numbers of nuisance parameters, such as those routinely arising in physics research.
Year(s) Of Engagement Activity 2019
URL https://indico.cern.ch/event/769726/
 
Description Research supervision of a Mary Lister McCammon Scholar 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Undergraduate students
Results and Impact The Mary Lister McCammon Summer Research Fellowship is a funded opportunity for a female undergraduate student to spend the summer before their final year at university working in research with a mathematician or statistician at Imperial College London. The purpose is to encourage women to do a PhD in mathematics.
Year(s) Of Engagement Activity 2019
URL https://www.imperial.ac.uk/mathematics/postgraduate/the-mary-lister-mccammon-summer-research-fellows...
 
Description Women in mathematics undergraduate research and careers event 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Undergraduate students
Results and Impact This was a broad research talk to female undergraduate mathematics students, followed by a question and answer session about pursuing a research career in mathematics.
Year(s) Of Engagement Activity 2019