Spatiotemporal statistical machine learning (ST-SML): theory, methods, and applications
Lead Research Organisation:
University of Oxford
Department Name: Computer Science
Abstract
Machine learning (ML) is the computational beating heart of the modern Artificial Intelligence (AI) renaissance. A number of fields, from computer vision to speech recognition have been completely transformed by the successes of machine learning. But practitioners and policymakers struggle when it comes to translating the successes of ML from narrowly defined prediction problems---e.g. "is this a picture of a cat?"---to the broader and messier world of public health and public policy. This fellowship will fund research on new ML methods to enable us to better ask and answer questions concerning change over space and time, such as:
1) How does disease risk, poverty, or housing quality vary within a country and over time?
2) Can satellite data enable us to answer policy questions in a more timely and spatially localised manner?
3) Do the dynamics of violent crime differ in different cities?
4) Did the world achieve the Millennium Development Goals? Will the world achieve the Sustainable Development Goals?
Bespoke answers to these questions are not enough, because practitioners in the public sector face new challenges in real-time. They need reproducible and well-documented applied workflows to follow to enable them to tackle important public policy problems as they arise.
1) How does disease risk, poverty, or housing quality vary within a country and over time?
2) Can satellite data enable us to answer policy questions in a more timely and spatially localised manner?
3) Do the dynamics of violent crime differ in different cities?
4) Did the world achieve the Millennium Development Goals? Will the world achieve the Sustainable Development Goals?
Bespoke answers to these questions are not enough, because practitioners in the public sector face new challenges in real-time. They need reproducible and well-documented applied workflows to follow to enable them to tackle important public policy problems as they arise.
Planned Impact
Who might benefit from this research and how might they benefit?
The partners who have supported this fellowship, NASA, the World Food Programme, and the UNAIDS Reference Group, will all directly benefit from the proposed development of statistical machine learning methods for spatiotemporal data. These methods will be developed to directly tackle challenges faced by these organisations in understanding and improving the health and well-being of humans.
* The World Food Programme assists 86.7 million people in around 83 countries each year, and this fellowship will develop survey design, analysis methods, and data scientific workflows to better target food aid to improve food security in these countries and quickly respond to unfolding humanitarian emergencies.
* NASA is using satellite data to map air pollution in low income countries, and this fellowship will focus on new methods to make timely and fine-grained estimates and theory to support the rigorous evaluations of these methods.
* The UNAIDS Reference Group on Estimates, Modelling, and Projections is responsible for advising UNAIDS and country governments on spatiotemporal statistics and forecasts related to the estimated 37 million people worldwide living with HIV. The fellowship will develop new methods for analysis and data collection to enable the UNAIDS Reference Group to provide estimates at the right spatial and temporal scale to be policy-relevant.
Through the support of the Stan Development Team, methods and applied workflows will be disseminated as widely as possible. The Stan software, which we will extend to handle larger and more flexible spatiotemporal statistical models, is downloaded almost a million times per year. This means that there is a very large audience with the expertise to use and adopt the methods we are developing in a range of application areas, from public health and public policy to natural science, healthcare, and business analytics.
The partners who have supported this fellowship, NASA, the World Food Programme, and the UNAIDS Reference Group, will all directly benefit from the proposed development of statistical machine learning methods for spatiotemporal data. These methods will be developed to directly tackle challenges faced by these organisations in understanding and improving the health and well-being of humans.
* The World Food Programme assists 86.7 million people in around 83 countries each year, and this fellowship will develop survey design, analysis methods, and data scientific workflows to better target food aid to improve food security in these countries and quickly respond to unfolding humanitarian emergencies.
* NASA is using satellite data to map air pollution in low income countries, and this fellowship will focus on new methods to make timely and fine-grained estimates and theory to support the rigorous evaluations of these methods.
* The UNAIDS Reference Group on Estimates, Modelling, and Projections is responsible for advising UNAIDS and country governments on spatiotemporal statistics and forecasts related to the estimated 37 million people worldwide living with HIV. The fellowship will develop new methods for analysis and data collection to enable the UNAIDS Reference Group to provide estimates at the right spatial and temporal scale to be policy-relevant.
Through the support of the Stan Development Team, methods and applied workflows will be disseminated as widely as possible. The Stan software, which we will extend to handle larger and more flexible spatiotemporal statistical models, is downloaded almost a million times per year. This means that there is a very large audience with the expertise to use and adopt the methods we are developing in a range of application areas, from public health and public policy to natural science, healthcare, and business analytics.
People |
ORCID iD |
Seth Flaxman (Principal Investigator / Fellow) |
Publications
Ball J
(2022)
Using deep convolutional neural networks to forecast spatial patterns of Amazonian deforestation
in Methods in Ecology and Evolution
Bennett JE
(2023)
Changes in life expectancy and house prices in London from 2002 to 2019: hyper-resolution spatiotemporal analysis of death registration and real estate data.
in The Lancet regional health. Europe
Boland MA
(2021)
Improving axial resolution in Structured Illumination Microscopy using deep learning.
in Philosophical transactions. Series A, Mathematical, physical, and engineering sciences
Bradley VC
(2021)
Unrepresentative big surveys significantly overestimated US vaccine uptake.
in Nature
Brizzi A
(2022)
Spatial and temporal fluctuations in COVID-19 fatality rates in Brazilian hospitals.
in Nature medicine
Charles G
(2023)
Seq2Seq Surrogates of Epidemic Models to Facilitate Bayesian Inference
in Proceedings of the AAAI Conference on Artificial Intelligence
Cluver L
(2023)
Reauthorise PEPFAR to prevent death, orphanhood, and suffering for millions of children
in The Lancet
Dhar MS
(2021)
Genomic characterization and epidemiology of an emerging SARS-CoV-2 variant in Delhi, India.
in Science (New York, N.Y.)
Faria N
(2021)
Genomics and epidemiology of the P.1 SARS-CoV-2 lineage in Manaus, Brazil
in Science
Flaxman S
(2023)
List child dependents on death certificates.
in Science (New York, N.Y.)
Flaxman S
(2023)
Assessment of COVID-19 as the Underlying Cause of Death Among Children and Young People Aged 0 to 19 Years in the US.
in JAMA network open
Gurdasani D
(2021)
Vaccinating adolescents against SARS-CoV-2 in England: a risk-benefit analysis.
in Journal of the Royal Society of Medicine
Hawryluk I
(2023)
Application of referenced thermodynamic integration to Bayesian model selection.
in PloS one
Hillis SD
(2021)
COVID-19-Associated Orphanhood and Caregiver Death in the United States.
in Pediatrics
Hillis SD
(2021)
Global minimum estimates of children affected by COVID-19-associated orphanhood and deaths of caregivers: a modelling study.
in Lancet (London, England)
Holbrook AJ
(2021)
Scalable Bayesian inference for self-excitatory stochastic processes applied to big American gunfire data.
in Statistics and computing
Howes A
(2023)
Spatio-temporal estimates of HIV risk group proportions for adolescent girls and young women across 13 priority countries in sub-Saharan Africa
in PLOS Global Public Health
Krawczyk K
(2021)
Quantifying Online News Media Coverage of the COVID-19 Pandemic: Text Mining Study and Resource.
in Journal of medical Internet research
Krawczyk K
(2021)
Correction: Quantifying Online News Media Coverage of the COVID-19 Pandemic: Text Mining Study and Resource
in Journal of Medical Internet Research
Lamprinakou S
(2023)
BART-based inference for Poisson processes
in Computational Statistics & Data Analysis
Lightley J
(2022)
Robust deep learning optical autofocus system applied to automated multiwell plate single molecule localization microscopy.
in Journal of microscopy
Meyerowitz-Katz G
(2021)
Is the cure really worse than the disease? The health impacts of lockdowns during COVID-19
in BMJ Global Health
Mishra S
(2021)
Changing composition of SARS-CoV-2 lineages and rise of Delta variant in England.
in EClinicalMedicine
Mishra S
(2021)
Comparing the responses of the UK, Sweden and Denmark to COVID-19 using counterfactual modelling
in Scientific Reports
Mishra S
(2022)
$$\pi $$VAE: a stochastic process prior for Bayesian deep learning with MCMC
in Statistics and Computing
Mlcochova P
(2021)
SARS-CoV-2 B.1.617.2 Delta variant replication and immune evasion.
in Nature
Mohler G
(2021)
A modified two-process Knox test for investigating the relationship between law enforcement opioid seizures and overdoses
in Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences
Monod M
(2021)
Age groups that sustain resurging COVID-19 epidemics in the United States.
in Science (New York, N.Y.)
Monod M
(2023)
Regularised B-splines Projected Gaussian Process Priors to Estimate Time-trends in Age-specific COVID-19 Deaths
in Bayesian Analysis
Nyberg T
(2022)
Comparative analysis of the risks of hospitalisation and death associated with SARS-CoV-2 omicron (B.1.1.529) and delta (B.1.617.2) variants in England: a cohort study.
in Lancet (London, England)
Rashid T
(2021)
Life expectancy and risk of death in 6791 communities in England from 2002 to 2019: high-resolution spatiotemporal analysis of civil registration data.
in The Lancet. Public health
Scott L
(2021)
Track Omicron's spread with molecular data
in Science
Semenova E
(2022)
PriorVAE: encoding spatial priors with variational autoencoders for small-area estimation.
in Journal of the Royal Society, Interface
Sharma M
(2021)
Understanding the effectiveness of government interventions against the resurgence of COVID-19 in Europe.
in Nature communications
Smith TP
(2021)
Temperature and population density influence SARS-CoV-2 transmission in the absence of nonpharmaceutical interventions.
in Proceedings of the National Academy of Sciences of the United States of America
Suel E
(2021)
Multimodal deep learning from satellite and street-level imagery for measuring income, overcrowding, and environmental deprivation in urban areas.
in Remote sensing of environment
Unwin HJT
(2021)
Using Hawkes Processes to model imported and local malaria cases in near-elimination settings.
in PLoS computational biology
Unwin HJT
(2022)
Global, regional, and national minimum estimates of children affected by COVID-19-associated orphanhood and caregiver death, by age and family circumstance up to Oct 31, 2021: an updated modelling study.
in The Lancet. Child & adolescent health
Vollmer MAC
(2021)
The impact of the COVID-19 pandemic on patterns of attendance at emergency departments in two large London hospitals: an observational study.
in BMC health services research
Vollmer MAC
(2021)
A unified machine learning approach to time series forecasting applied to demand at emergency departments.
in BMC emergency medicine
Volz E
(2021)
Assessing transmissibility of SARS-CoV-2 lineage B.1.1.7 in England.
in Nature
Wolock TM
(2021)
Evaluating distributional regression strategies for modelling self-reported sexual age-mixing.
in eLife
Description | Computational statistical machine learning methods can have a large impact on public policy. In the context of the Covid-19 pandemic, the focus of the applied work was on epidemiological, statistical, and demographic modelling to understand the characteristics and spread of novel variants, the impact of non pharmaceutical interventions, and to quantify pandemic-associated orphanhood. Underlying the applied studies, which had major policy impacts, were novel computational tools we developed, to enable flexible and scalable spatiotemporal statistical modelling. These tools combined statistical approaches to quantifying uncertainty called Bayesian inference with the most exciting area of Artificial Intelligence (which underlies ChatGPT and DALL-E): deep generative modelling. |
Exploitation Route | The methods that we developed form the basis for further funding proposals to charities and research councils. The applied work underpins very large-scale funding requests to support orphans and vulnerable children in low and middle-income countries currently being considered by international partners and country governments. |
Sectors | Healthcare,Government, Democracy and Justice |
URL | https://www.cdc.gov/globalhealth/covid-19/orphanhood/index.html |
Description | Our Lancet publication, "Global minimum estimates of children affected by COVID-19-associated orphanhood and deaths of caregivers: a modelling study," Hillis et al (2021), for which I was the senior author, is a landmark paper in guiding public policy response towards children during and in the recovery from the Covid-19 pandemic. Hundreds of pieces appeared in the media covering our work. It was discussed within the Biden administration in Washington, the Vatican, by the World Health Organization and the World Bank. When the paper was published we also produced a policy report, "Children: The Hidden Pandemic 2021" with the US Centers for Disease Control and Prevention which was widely disseminated to country governments, charities, and international organizations. We also produced a web tool, the Imperial College Orphanhood Calculator (https://imperialcollegelondon.github.io/orphanhood_calculator) to provide up to date estimates. Subsequent studies building on this foundational work have appeared in Lancet Child & Adolescent Health, Pediatrics, and JAMA Pediatrics. |
First Year Of Impact | 2021 |
Sector | Healthcare,Government, Democracy and Justice |
Impact Types | Societal,Policy & public services |
Description | Citation in FDA presentation |
Geographic Reach | North America |
Policy Influence Type | Citation in other policy documents |
Impact | VRBPAC voted to approve new childhood vaccines for COVID-19 in 2021. Our work was cited in the presentation on epidemiology. |
URL | https://www.fda.gov/media/159222/download |
Description | The Global Reference Group on Children Affected by COVID-19 |
Geographic Reach | Multiple continents/international |
Policy Influence Type | Participation in a guidance/advisory committee |
Impact | The Global Reference Group on Children Affected by COVID-19 advises governments and charities on providing care and support for Covid-19 orphans and vulnerable children and their surviving family members. |
URL | https://www.spi.ox.ac.uk/the-global-reference-group-on-children-affected-by-covid-19 |
Description | Copenhagen/Oxford |
Organisation | University of Copenhagen |
Department | Department of Public Health |
Country | Denmark |
Sector | Academic/University |
PI Contribution | I collaborate closely with two members of this department on a number of ongoing and completed research projects. |
Collaborator Contribution | Two members of this department collaborate closely with myself and my two postdocs on ongoing and completed research projects. |
Impact | Major publications: 1) Volz et al, "Assessing transmissibility of SARS-CoV-2 lineage B.1.1.7 in England," Nature 2021 2) Faria et al, "Genomics and epidemiology of the P.1 SARS-CoV-2 lineage in Manaus, Brazil," Science 2021 3) Mishra et al, "Changing composition of SARS-CoV-2 lineages and rise of Delta variant in England," EClinicalMedicine 2021. Multi-disciplinary: epidemiology, biostatistics, and computational statistics / machine learning |
Start Year | 2021 |
Description | US CDC/Oxford |
Organisation | Centers for Disease Control and Prevention (CDC) |
Country | United States |
Sector | Public |
PI Contribution | We have worked closely with the US Centers for Disease Control and Prevention on estimating the global burden of pandemic-associated orphanhood. |
Collaborator Contribution | CDC colleagues have been an integral part of our research collaboration and have provided data for our US analyses. |
Impact | See Hillis et al, Lancet 2021 and Hillis et al, JAMA Pediatrics 2022 |
Start Year | 2020 |
Description | "Covid-19 is leaving millions of orphaned children behind" |
Form Of Engagement Activity | A press release, press conference or response to a media enquiry/interview |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Media (as a channel to the public) |
Results and Impact | I was a guest on the STAT News podcast _First Opinion_: "Covid-19 is leaving millions of orphaned children behind". |
Year(s) Of Engagement Activity | 2022 |
URL | https://www.statnews.com/2022/06/01/millions-orphans-covid-is-leaving-behind/ |