Intractable Likelihood: New Challenges from Modern Applications (ILike)
Lead Research Organisation:
University of Warwick
Department Name: Statistics
Abstract
In most statistical contexts, it is recognised that inference methodology based on the likelihood function are usually methods of choice. However such methods are not always easy to implement. For instance, in complex problems often with massive data sets, it can sometimes be completely impossible to even evaluate the likelihood function. The computational statistics revolution of the 1990s provided powerful methodology for carrying out likelihood-based inference, including Markov chain Monte Carlo methods, the EM algorithm, many associated optimisation techniques for likelihoods, and Sequential Monte Carlo methods.
Although these methods have been and are highly successful in making likelihood-based inference accessible to a wide range of problems from virtually every area of science and technology, we now have a far better understanding of their limitations, for example in high-dimensional problems and for massive data sets. Thus many challenging statistical inference problems of the 21st century cannot be addressed using existing likelihood-based methods. Examples which motivate the current project come from genetics, genomics, infectious disease epidemiology, ecology, commerce, and bibliometrics.
However there have been various recent breakthroughs in computational and statistical approaches to intractable likelihood problems, including pseudo-marginal and particle MCMC, likelihood-free methods such as Approximate Bayesian Computation, composite and pseudo-likelihoods, new simulation methods for hitherto intractable stochastic models, and adaptive Monte Carlo methods. These advances coupled with developments in multi-core computational technologies such as GPUs, have enormous potential for extending likelihood methods to meet the most difficult challenges of modern scientific questions.
All these new areas have demonstrated considerable promise. However, a unified approach is required to deliver a step change in our capability to implement likelihood. The Ilike workplan will initially involve the investigation of 10 projects all combining two or more of the about breakthrough areas.
The second half of the project will involve a similar workplan informed by the outcomes of the initial projects.
Although these methods have been and are highly successful in making likelihood-based inference accessible to a wide range of problems from virtually every area of science and technology, we now have a far better understanding of their limitations, for example in high-dimensional problems and for massive data sets. Thus many challenging statistical inference problems of the 21st century cannot be addressed using existing likelihood-based methods. Examples which motivate the current project come from genetics, genomics, infectious disease epidemiology, ecology, commerce, and bibliometrics.
However there have been various recent breakthroughs in computational and statistical approaches to intractable likelihood problems, including pseudo-marginal and particle MCMC, likelihood-free methods such as Approximate Bayesian Computation, composite and pseudo-likelihoods, new simulation methods for hitherto intractable stochastic models, and adaptive Monte Carlo methods. These advances coupled with developments in multi-core computational technologies such as GPUs, have enormous potential for extending likelihood methods to meet the most difficult challenges of modern scientific questions.
All these new areas have demonstrated considerable promise. However, a unified approach is required to deliver a step change in our capability to implement likelihood. The Ilike workplan will initially involve the investigation of 10 projects all combining two or more of the about breakthrough areas.
The second half of the project will involve a similar workplan informed by the outcomes of the initial projects.
Planned Impact
The potential impact of the ILike research is substantial due to the fact that the statistical methods are, and can be, used within a range of scientific and industrial applications. We identify four specific pathways to impact:
Dissemination and Publicity
Medium and long term impact of the grant will be achieved by continued and sustained publication in top statistics and application journals, and presentation
at appropriate statistics and application area conferences. Co-ordination of dissemination activities will be key throughout the project. In order to catalyse impact to the research community at large, conference attendance by either investigators or appointed RAs will carry with it a responsibility of reporting the research accomplishments of the grant as a whole, as far as this is possible within the constraints of the particular meeting.
The above will be supplemented by specific activities organised by Ilike, e.g. the EPSRC Science and Innovation awards in Statistics at Warwick and Bristol (CRiSM and SuSTaIn) and their respective strategies of high profile activities. We will organise annual open workshops on New Directions for Likelihood showcasing our research, and acquiring up-to-date information on the latest global developments.
We will also run two Winter schools in Ilike, aimed at early career statisticians, and especially 2nd and 3rd year PhD students. These will help make the use of methods developed in ILike more routine for the next generation of statistical researchers and end-users (including students who will pursue industrial career paths).
Impact of our research will be further amplified through an intensive four week programme around three years into the grant, to which we will invite many world experts in computational likelihood methods. We intend to apply to the Isaac Newton Institute to host this meeting.
We will also publicise our research through a dedicated web-site, making our work readily accessible to all, and giving a focus on highlighting key research ideas, and their application, in a way that is accessible to non-statistical researchers.
Supply of People
The programme grant will train at least 5 new researchers who will be expert in modern statistical methods and their application, and help address the current, widely acknowledged, skills-shortage in this area. Many more will benefit from knowledge gained from our workshops and conferences.
Software and Accessibility of Research
In order to maximise the impact of new computational statistical methods, we will make software implementing new methods freely available via the ILike website. The form this software takes will vary, as appropriate. For example, methods which allow inference for specific models or applications will be developed as R packages. For more generic algorithmic methods we will make available demo software: implementations of the algorithm for different exemplar problems. This will supplement our making details of all research freely available through publishing of pre-prints and technical reports on the website.
Interaction with end-users
Our targeted example application areas (statistics of large-scale directed networks in ecology, commerce, bibliometrics, genetic and genomics and infectious disease epidemiology) are chosen because of our collective track record of applied work in those areas which has made a substantial scientific impact. To catalyse dissemination to these areas, we will feed developments from our methodological work directly through to our current projects in those areas (for example through our current various grants in these areas from BBSRC, MRC, ESRC, EU FP7, NIHR, Wellcome Trust).
We will also use the Ilike grant to launch major application-focussed proposals (such as in Statistics Genetics to the MRC, or in Infectious Disease Epidemiology to the ERC), thus providing secondary benefits to user communities.
Dissemination and Publicity
Medium and long term impact of the grant will be achieved by continued and sustained publication in top statistics and application journals, and presentation
at appropriate statistics and application area conferences. Co-ordination of dissemination activities will be key throughout the project. In order to catalyse impact to the research community at large, conference attendance by either investigators or appointed RAs will carry with it a responsibility of reporting the research accomplishments of the grant as a whole, as far as this is possible within the constraints of the particular meeting.
The above will be supplemented by specific activities organised by Ilike, e.g. the EPSRC Science and Innovation awards in Statistics at Warwick and Bristol (CRiSM and SuSTaIn) and their respective strategies of high profile activities. We will organise annual open workshops on New Directions for Likelihood showcasing our research, and acquiring up-to-date information on the latest global developments.
We will also run two Winter schools in Ilike, aimed at early career statisticians, and especially 2nd and 3rd year PhD students. These will help make the use of methods developed in ILike more routine for the next generation of statistical researchers and end-users (including students who will pursue industrial career paths).
Impact of our research will be further amplified through an intensive four week programme around three years into the grant, to which we will invite many world experts in computational likelihood methods. We intend to apply to the Isaac Newton Institute to host this meeting.
We will also publicise our research through a dedicated web-site, making our work readily accessible to all, and giving a focus on highlighting key research ideas, and their application, in a way that is accessible to non-statistical researchers.
Supply of People
The programme grant will train at least 5 new researchers who will be expert in modern statistical methods and their application, and help address the current, widely acknowledged, skills-shortage in this area. Many more will benefit from knowledge gained from our workshops and conferences.
Software and Accessibility of Research
In order to maximise the impact of new computational statistical methods, we will make software implementing new methods freely available via the ILike website. The form this software takes will vary, as appropriate. For example, methods which allow inference for specific models or applications will be developed as R packages. For more generic algorithmic methods we will make available demo software: implementations of the algorithm for different exemplar problems. This will supplement our making details of all research freely available through publishing of pre-prints and technical reports on the website.
Interaction with end-users
Our targeted example application areas (statistics of large-scale directed networks in ecology, commerce, bibliometrics, genetic and genomics and infectious disease epidemiology) are chosen because of our collective track record of applied work in those areas which has made a substantial scientific impact. To catalyse dissemination to these areas, we will feed developments from our methodological work directly through to our current projects in those areas (for example through our current various grants in these areas from BBSRC, MRC, ESRC, EU FP7, NIHR, Wellcome Trust).
We will also use the Ilike grant to launch major application-focussed proposals (such as in Statistics Genetics to the MRC, or in Infectious Disease Epidemiology to the ERC), thus providing secondary benefits to user communities.
Organisations
Publications
Filippi S
(2017)
A Bayesian Nonparametric Approach to Testing for Dependence Between Random Variables
in Bayesian Analysis
Ogden H
(2016)
A caveat on the robustness of composite likelihood estimators: the case of mis-specified random effect distribution
in Statistica Sinica
Bissiri PG
(2016)
A general framework for updating belief distributions.
in Journal of the Royal Statistical Society. Series B, Statistical methodology
Andrieu Christophe
(2020)
A general perspective on the Metropolis-Hastings kernel
in arXiv e-prints
Andrieu Christophe
(2015)
A note on one of the Markov chain Monte Carlo novice's questions
in arXiv e-prints
Nicholson G
(2017)
A note on statistical repeatability and study design for high-throughput assays.
in Statistics in medicine
Bierkens J
(2017)
A piecewise deterministic scaling limit of lifted Metropolis-Hastings in the Curie-Weiss model
in The Annals of Applied Probability
Aslett Louis J. M.
(2015)
A review of homomorphic encryption and software tools for encrypted statistical machine learning
in arXiv e-prints
Ogden H
(2015)
A sequential reduction method for inference in generalized linear mixed models
in Electronic Journal of Statistics
Description | We have developed new generic statistical methodology for intractable likelihood problems which many applications. In particular we have developed approximate likelihood methods, adaptive Monte Carlo methods, retrospective simulation methods, composite likelihood and ABC methods, advanced and scalable Monte Carlo methods for intractable likelihood problems, and methods for carrying out inference under encryption. The application areas we consider are diverse, though we have made particular progress in application in genetics, genomics and infectious disease epidemiology. |
Exploitation Route | The potential application areas of our work are wide-ranging. However the problem of enabling scientists who are not expert in the statistical methodology we develop, to be able to use our work is challenging. As much as possible, we provide publicly-available software to facilitate this. |
Sectors | Agriculture Food and Drink Digital/Communication/Information Technologies (including Software) Energy Financial Services and Management Consultancy Healthcare Government Democracy and Justice Manufacturing including Industrial Biotechology Pharmaceuticals and Medical Biotechnology Security and Diplomacy |
URL | http://www.i-like.org.uk |
Description | ilike was a foundational project which accomplished breakthroughs in computational statistics, and especially in problems where statistical likelihoods are intractable. Advances have been made in Markov chain Monte Carlo and its variants, in Sequential Monte Carlo methods, in Approximate Bayesian Computation, and rigorous statistical analysis in the context of privacy constraints. The influence of like has been substantial in shaping the future direction of the subject and in scientific applications, particularly in epidemics, genetics and genomics. Some specific application of this work which have emerged since the grant finished include the following examples. We worked during ilike and beyond on modelling and inference for severity of influenza modelling, work which relied heavily on the computational statistics tools we were developing. This work has beed used directly by Public Health England in its influenza epidemic management. Secondly the body of work on epidemics within ilike has led to techniques which were ready to be used by the UK government in inference prediction and control of the Covid Epidemic. Our work was used directly by the SPI-M committee in informing decision making on a weekly basis for around 2 years during the epidemic. As another example, David Firth's work on tournaments and ranking has provided the state of the art methodology for the statistical modelling of citation exchange between statistics journals and the subsequent use of journal rankings, and also in the development ofAlt-3 a popular online alternative to football league tables. One major focus of the ilike grant was on a class of algorithms know as Retrospective Simulation. These techniques were instrumental in recent work to design mechanisms for fair and practical draws for major sporting competitions. More broadly, the breakthroughs achieved in ABC techniques are routinely used in scientific applications in. many areas, including epidemiology, ecology, chemistry, etc. |
Sector | Leisure Activities, including Sports, Recreation and Tourism,Government, Democracy and Justice,Other |
Impact Types | Societal Policy & public services |
Description | Bean Review of UK Economic Statistics |
Geographic Reach | National |
Policy Influence Type | Contribution to a national consultation/review |
URL | https://www.gov.uk/government/publications/independent-review-of-uk-economic-statistics-final-report |
Description | Government Statistical Service, Methodology Advisory Committee |
Geographic Reach | National |
Policy Influence Type | Participation in a guidance/advisory committee |
Title | BradleyTerryScalable (R package) |
Description | An add-on package for R. Facilitates the use of Bradley-Terry models for ranking in large, directed networks. |
Type Of Technology | Software |
Year Produced | 2017 |
Open Source License? | Yes |
Impact | None known as yet. |
URL | https://cran.r-project.org/web/packages/BradleyTerryScalable/index.html |
Title | EncryptedStats - Statistical methods amenable to encrypted computation |
Description | This package enables provides implementations of existing or novel statistical machine learning techniques implemented in a manner which is amenable to encrypted computation under a homomorphic encryption scheme supporting both addition and multiplication. The implementations are such that they can be run on both encrypted and unencrypted content, making use of the full operator overloading supported by the HomomorphicEncryption R package. |
Type Of Technology | Software |
Year Produced | 2015 |
Open Source License? | Yes |
Impact | The public release was within the past 6 months, so impact is only starting to feed through. A startup company, Numerai, have been in contact and are exploring its use in financial forecasting. |
URL | http://www.louisaslett.com/EncryptedStats/ |
Title | HomomorphicEncryption - An R package for fully homomorphic encryption |
Description | This package enables use of optimised implementations of homomorphic encryption schemes from the user friendly interactive high-level language R and offers completely transparent use of multi-core CPU architectures during computations. |
Type Of Technology | Software |
Year Produced | 2015 |
Open Source License? | Yes |
Impact | The public release was within the past 6 months, so impact is only starting to feed through. I have been contacted by academics using the software to explore homomorphic encryption for the first time, becoming interested in the potential for its use in statistics and machine learning. A startup company, Numerai, have also been in contact and are exploring its use in sharing sensitive data to designers of proprietary algorithms, where both parties require privacy. |
URL | http://www.louisaslett.com/HomomorphicEncryption/ |
Title | Measuring influence in scientific fields |
Description | A prototype web-based visualisation tool for methods developed to rank research fields, and journals within fields, taking account of both in-field and wider influence based upon the global citation network. |
Type Of Technology | Webtool/Application |
Year Produced | 2018 |
Impact | None known as yet. |
URL | http://selbydavid.com/influence/ |
Title | PlackettLuce (R package) |
Description | An add-on package for R. Facilitates the use of Plackett-Luce models for ranking of multiple objects by multiple observers. |
Type Of Technology | Software |
Year Produced | 2017 |
Open Source License? | Yes |
Impact | None known as yet. |
URL | https://cran.r-project.org/web/packages/PlackettLuce/index.html |
Title | R package RZigZag: ZigZag Sampler |
Description | Implements the Zig-Zag algorithm with subsampling and control variates (ZZ-CV) of (Bierkens, Fearnhead, Roberts, 2016) as applied to Bayesian logistic regression, as well as basic Zig-Zag for a Gaussian target distribution. |
Type Of Technology | Software |
Year Produced | 2017 |
Open Source License? | Yes |
Impact | 887 downloads from CRAN since the package was released. |
URL | https://CRAN.R-project.org/package=RZigZag |
Title | mlmc |
Description | The first publicly available implementation of a full Multi-level Monte Carlo driver in the R language and the first driver in any language with built in multi core parallelism built in. The software includes classic MLMC example level samplers in both R and C++. |
Type Of Technology | Software |
Year Produced | 2016 |
Open Source License? | Yes |
Impact | The software was made available at the end of 2016 and impacts are currently unknown. |
URL | https://github.com/louisaslett/mlmc |
Description | Alternative football league table |
Form Of Engagement Activity | Engagement focused website, blog or social media channel |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Public/other audiences |
Results and Impact | A spin-off from my methodological work on ranking: this website/blog presents a novel, mathematically principled Premier League table that is more informative than the official league table. The website has been publicised via Twitter, and attracts hundreds of visitors already on a regular basis. A primary aim of this work is to promote the value of principled mathematics, through the medium of the world's most popular sport. |
Year(s) Of Engagement Activity | 2017 |
URL | http://alt-3.uk |
Description | General Election 2017, BBC TV and radio appearances |
Form Of Engagement Activity | A press release, press conference or response to a media enquiry/interview |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Public/other audiences |
Results and Impact | In June 2017 I was approached by the BBC to make an online video and also to do a series of radio and TV interviews, about my work on election-day exit poll methodology (work that had taken place between 2004 and 2006). The video is globally visible at the URL shown. The TV interviews were nationally syndicated to BBC news programmes. This media exposure led to my being called upon by the House of Lords enquiry into electoral polling, to help frame their work during summer 2017. I regard it also as an important piece of public advocacy for the value of principled statistical methods in everyday life. |
Year(s) Of Engagement Activity | 2017 |
URL | http://www.bbc.co.uk/news/av/uk-england-coventry-warwickshire-40175188/general-election-2017-the-sta... |
Description | Popular science alumni magazine article |
Form Of Engagement Activity | A magazine, newsletter or online publication |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Public/other audiences |
Results and Impact | A popular science article written for Corpus Christi College Oxford alumni magazine as a research showcase. There is an expression of interest from the Royal Statistical Society for this article to be expanded to a full article for society's official magazine, Significance, which is intended as an outreach publication. |
Year(s) Of Engagement Activity | 2015 |
URL | http://www.louisaslett.com/dl/Aslett_LJM_DoingScienceBlindfold.pdf |