Principled inference for functionals of large structured covariance matrices.
Lead Research Organisation:
Imperial College London
Department Name: Mathematics
Abstract
Statistics plays a fundamental role in daily life, allowing costly medical screening, drug development, marketing campaigns or government regulation to be better targeted through improved understanding of the scientific or societal truths underpinning the data we observe. More and more frequently, the scientific truths we wish to learn correspond to a high dimensional parameter. This project considers covariance matrices and related quantities such as inverse covariance matrices, which are particularly important types of high dimensional parameter, arising in numerous statistical applications. When the dimensionality of the covariance matrix is larger than the number of available data points, structure (sparsity in some domain) must be assumed in order to obtain estimates that are well behaved statistically. This project explores new types of structure for covariance and inverse covariance matrix estimation. Some of these structures facilitate uncertainty statements about the true high dimensional parameter rather than simply providing a point estimate. They also allow different estimates to be aggregated without losing statistical accuracy.
Planned Impact
The research of this proposal aims to institute a fundamental shift in the cognitive framework underpinning the area of large covariance estimation, which, until now, has only considered point estimators and not set estimators. Large covariance and inverse covariance estimation is an indispensable ingredient to almost every modern statistical procedure, yet the current methodological toolkit is currently insufficiently rich to respond to the demands of practitioners (see Section 2 of the Case for Support document). It also contributes to the area of distributed inference for large scale data. Whilst being widely employed by practitioners, distributed inference is poorly understood theoretically. The research thus contributes to the advancement of statistics as a discipline.
On a national level, improved understanding of the scientific or societal truths underpinning the data we observe allows significant long term economic benefits. For instance, it allows costly medical screening, drug development, marketing campaigns or government regulation to be better targeted.
A direct beneficiary of the research is the Imperial College funded PhD student, who will have the opportunity to work on projects at the heart of modern statistics researched internationally. The organisation who later employs them and the individuals who later work with them will benefit from their expertise in this important area. More generally, the Statistics Section at Imperial College will benefit from exposure to seminar speakers and conference speakers who would not normally have attended (see Justification of Resources document).
On a national level, improved understanding of the scientific or societal truths underpinning the data we observe allows significant long term economic benefits. For instance, it allows costly medical screening, drug development, marketing campaigns or government regulation to be better targeted.
A direct beneficiary of the research is the Imperial College funded PhD student, who will have the opportunity to work on projects at the heart of modern statistics researched internationally. The organisation who later employs them and the individuals who later work with them will benefit from their expertise in this important area. More generally, the Statistics Section at Imperial College will benefit from exposure to seminar speakers and conference speakers who would not normally have attended (see Justification of Resources document).
People |
ORCID iD |
Heather Battey (Principal Investigator / Fellow) |
Publications
Battey H
(2020)
High dimensional nuisance parameters: an example from parametric survival analysis
in Information Geometry
Battey H
(2019)
On sparsity scales and covariance matrix transformations
in Biometrika
Battey H
(2021)
A note on the analytic approximation of exceedance probabilities in heterogeneous populations
in Statistics & Probability Letters
Avella-Medina M
(2018)
Robust estimation of high-dimensional covariance and precision matrices.
in Biometrika
Description | I have proposed and provided theoretical justification for confidence sets of models (as distinct from confidence regions for the parameters of a given model) in sparse high-dimensional regression problems. This is particularly relevant for the statistical analysis of genomics data. The R package implementing these ideas has been dowloaded nearly 15000 times between July 2018 and February 2021. I have sought relevant embeddings for covariance matrices where the interesting models are sparse and have explored statistical aspects associated with exploiting sparsity in these new domains. This work has opened up new research directions. Motivated by sociological applications in which linear representations of probabilities are used to model binary responses, I elucidated some of the properties of this model, together with approaches to performing inference on the associated regression coefficients. This work also proposed an approach for assessing sensitivity to missing observations. I have provided insights and approaches for reliable inference on interest parameters, such as treatment or exposure effects, when a large number of nuisance parameters are present in a probabilistic model. |
Exploitation Route | Sparse regression models are widely used to analyse genomics data. The previous state of the art methods of analysis, while effective for prediction, are usually misleading if understanding is the goal. Confidence sets of models (Cox and Battey, 2017; Battey and Cox, 2018) specify the reasonable scientific explanations that are compatible with the data and are thus more suitable when understanding is sought. The software and detailed guide to usage facilitate this. |
Sectors | Healthcare,Pharmaceuticals and Medical Biotechnology |
Description | Work from to two published papers produced under this award (Cox and Battey, 2017; Battey and Cox, 2018) has been used by practitioners including at the National Institute of Health, Radiation Epidemiology Branch and at the Nuffield Department of Population Health, University of Oxford. The associated R package `HCmodelSets', a translation of the original source code, was downloaded 24350 times between July 2018 and March 2023. The number of downloads far exceeds the number of citations in academic journals, perhaps indicative of impact beyond academia. At the time of writing (March 2023), citations to the two papers appear in several journals outside of the statistics and applied probability domain in which the grant was awarded, namely: - International Journal of Epidemiology - The American Journal of Clinical Nutrition - Behaviormetrika - Gene Citations to another published paper under this award (Battey, Cox and Jackson, 2019) also appear in journals from outside of my field. Some of the citing articles relate to the Covid-19 pandemic. At the time of writing (March 2023) the journal names are: - Applied Research in Quality of Life - Arts Education Policy Review - Environmental Health - Frontiers in Psychiatry - Global Public Health - International Journal of Environmental Research and Public Health - Journal of Affective Disorders Reports - Journal of Economic Behaviour and Organization - Journal of Environmental Research and Public Health - Journal of Family and Economic Issues - Journal of Occupational Health - Journal of Transport Geography - Preventative Medicine - Psychological Methods - Reproductive Medicine and Biology - Society of Labor Economists -- Annual Conference - Social Science Research - Sociology A third paper from the award (Beale, Battey, Davison, MacKay, 2020) is acquiring citations in the areas of artificial intelligence and ethics. The work was covered in several scientific media outlets: - Eurekalert.org: , - Alpha Galileo: - Sciencenewsnet.in: - Phys.org: - IFL science: - News wise: - True Viral News: - Electronics weekly: - NCYT: - SYFY: - MedIndia Network for Health: - Analytics India Magazine: - Biometric Update: - Eurasia Review - Science Bulletin: - Good News Network: - Singularity Hub: |
First Year Of Impact | 2018 |
Sector | Creative Economy,Digital/Communication/Information Technologies (including Software),Environment,Healthcare,Pharmaceuticals and Medical Biotechnology,Transport,Other |
Impact Types | Societal |
Description | London Taught Course Centre lecture course |
Geographic Reach | Local/Municipal/Regional |
Policy Influence Type | Influenced training of practitioners or researchers |
Impact | Any tangible impacts are not known. |
Description | Theoretical foundations of inference in the presence of large numbers of nuisance parameters |
Amount | £792,229 (GBP) |
Funding ID | EP/T01864X/1 |
Organisation | Engineering and Physical Sciences Research Council (EPSRC) |
Sector | Public |
Country | United Kingdom |
Start | 10/2020 |
End | 09/2025 |
Description | Associate Member of Nuffield College, University of Oxford |
Organisation | University of Oxford |
Department | Nuffield College |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | Research collaboration with Prof. Sir David Cox. |
Collaborator Contribution | Expertise; intellectual input. |
Impact | Multiple research papers detailed elsewhere in the form. Dissemination of research by both parties. |
Start Year | 2017 |
Description | Collaboration with Sciteb Ltd on the ethical implications of artificial intelligence. |
Organisation | Sciteb |
Country | United Kingdom |
Sector | Private |
PI Contribution | Expertise; intellectual input. |
Collaborator Contribution | Expertise; intellectual input; broad dissemination beyond academic circles. |
Impact | Publication in a Royal Society journal. Details elsewhere in this form. |
Start Year | 2019 |
Title | HCmodelSets |
Description | An R package for constructing well-fitting models in regression with a large number of potential explanatory variables. |
Type Of Technology | Software |
Year Produced | 2018 |
Open Source License? | Yes |
Impact | The software has been downloaded almost 15000 times between July 2018 and February 2021. |
URL | https://cran.r-project.org/web/packages/HCmodelSets/index.html |
Title | Large numbers of explanatory variables, a semi-descriptive analysis. |
Description | Matlab code for Large numbers of explanatory variables, a semi-descriptive analysis. |
Type Of Technology | Software |
Year Produced | 2017 |
Open Source License? | Yes |
Impact | Allows the implementation of the method described by Cox, DR and Battey, HS, Proceedings of the National Academy of Sciences, 114 (32), 8592-8595. |
Description | A talk at the research and careers showcase for Imperial College undergraduate students |
Form Of Engagement Activity | Participation in an open day or visit at my research institution |
Part Of Official Scheme? | No |
Geographic Reach | Local |
Primary Audience | Undergraduate students |
Results and Impact | Approximately 40 undergraduate students attended and asked questions about pursuing a research career in mathematics. |
Year(s) Of Engagement Activity | 2017 |
Description | Article for Imperial College Tech Digest |
Form Of Engagement Activity | A magazine, newsletter or online publication |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Industry/Business |
Results and Impact | My article highlighted possible pitfalls of automated decision making, big data and machine learning. |
Year(s) Of Engagement Activity | 2020 |
Description | Mathematics for Medicine mini symposium |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | Local |
Primary Audience | Professional Practitioners |
Results and Impact | The intended purpose was to make medical scientists aware of relevant research developments in the mathematics department and vice versa, with an emphasis on open mathematical or statistical problems in medical research. |
Year(s) Of Engagement Activity | 2018 |
URL | https://lms.mrc.ac.uk/mixing-maths-and-medical-science/ |
Description | Presentation at the National Institute of Health |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Third sector organisations |
Results and Impact | Work was presented at the National Institute of Health, Washington DC. |
Year(s) Of Engagement Activity | 2018 |
URL | https://dceg.cancer.gov/news-events/events/2018/cox-seminar-statistics |
Description | Presentation to experimental physicists |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Presentation explaining inferential strategies suitable for problems with large numbers of nuisance parameters, such as those routinely arising in physics research. |
Year(s) Of Engagement Activity | 2019 |
URL | https://indico.cern.ch/event/769726/ |
Description | Research supervision of a Mary Lister McCammon Scholar |
Form Of Engagement Activity | A formal working group, expert panel or dialogue |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Undergraduate students |
Results and Impact | The Mary Lister McCammon Summer Research Fellowship is a funded opportunity for a female undergraduate student to spend the summer before their final year at university working in research with a mathematician or statistician at Imperial College London. The purpose is to encourage women to do a PhD in mathematics. |
Year(s) Of Engagement Activity | 2019 |
URL | https://www.imperial.ac.uk/mathematics/postgraduate/the-mary-lister-mccammon-summer-research-fellows... |
Description | Women in mathematics undergraduate research and careers event |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | Local |
Primary Audience | Undergraduate students |
Results and Impact | This was a broad research talk to female undergraduate mathematics students, followed by a question and answer session about pursuing a research career in mathematics. |
Year(s) Of Engagement Activity | 2019 |