Intractable Likelihood: New Challenges from Modern Applications (ILike)

Lead Research Organisation: University of Warwick
Department Name: Statistics

Abstract

In most statistical contexts, it is recognised that inference methodology based on the likelihood function are usually methods of choice. However such methods are not always easy to implement. For instance, in complex problems often with massive data sets, it can sometimes be completely impossible to even evaluate the likelihood function. The computational statistics revolution of the 1990s provided powerful methodology for carrying out likelihood-based inference, including Markov chain Monte Carlo methods, the EM algorithm, many associated optimisation techniques for likelihoods, and Sequential Monte Carlo methods.

Although these methods have been and are highly successful in making likelihood-based inference accessible to a wide range of problems from virtually every area of science and technology, we now have a far better understanding of their limitations, for example in high-dimensional problems and for massive data sets. Thus many challenging statistical inference problems of the 21st century cannot be addressed using existing likelihood-based methods. Examples which motivate the current project come from genetics, genomics, infectious disease epidemiology, ecology, commerce, and bibliometrics.

However there have been various recent breakthroughs in computational and statistical approaches to intractable likelihood problems, including pseudo-marginal and particle MCMC, likelihood-free methods such as Approximate Bayesian Computation, composite and pseudo-likelihoods, new simulation methods for hitherto intractable stochastic models, and adaptive Monte Carlo methods. These advances coupled with developments in multi-core computational technologies such as GPUs, have enormous potential for extending likelihood methods to meet the most difficult challenges of modern scientific questions.

All these new areas have demonstrated considerable promise. However, a unified approach is required to deliver a step change in our capability to implement likelihood. The Ilike workplan will initially involve the investigation of 10 projects all combining two or more of the about breakthrough areas.
The second half of the project will involve a similar workplan informed by the outcomes of the initial projects.

Planned Impact

The potential impact of the ILike research is substantial due to the fact that the statistical methods are, and can be, used within a range of scientific and industrial applications. We identify four specific pathways to impact:

Dissemination and Publicity

Medium and long term impact of the grant will be achieved by continued and sustained publication in top statistics and application journals, and presentation
at appropriate statistics and application area conferences. Co-ordination of dissemination activities will be key throughout the project. In order to catalyse impact to the research community at large, conference attendance by either investigators or appointed RAs will carry with it a responsibility of reporting the research accomplishments of the grant as a whole, as far as this is possible within the constraints of the particular meeting.

The above will be supplemented by specific activities organised by Ilike, e.g. the EPSRC Science and Innovation awards in Statistics at Warwick and Bristol (CRiSM and SuSTaIn) and their respective strategies of high profile activities. We will organise annual open workshops on New Directions for Likelihood showcasing our research, and acquiring up-to-date information on the latest global developments.

We will also run two Winter schools in Ilike, aimed at early career statisticians, and especially 2nd and 3rd year PhD students. These will help make the use of methods developed in ILike more routine for the next generation of statistical researchers and end-users (including students who will pursue industrial career paths).

Impact of our research will be further amplified through an intensive four week programme around three years into the grant, to which we will invite many world experts in computational likelihood methods. We intend to apply to the Isaac Newton Institute to host this meeting.

We will also publicise our research through a dedicated web-site, making our work readily accessible to all, and giving a focus on highlighting key research ideas, and their application, in a way that is accessible to non-statistical researchers.

Supply of People

The programme grant will train at least 5 new researchers who will be expert in modern statistical methods and their application, and help address the current, widely acknowledged, skills-shortage in this area. Many more will benefit from knowledge gained from our workshops and conferences.

Software and Accessibility of Research

In order to maximise the impact of new computational statistical methods, we will make software implementing new methods freely available via the ILike website. The form this software takes will vary, as appropriate. For example, methods which allow inference for specific models or applications will be developed as R packages. For more generic algorithmic methods we will make available demo software: implementations of the algorithm for different exemplar problems. This will supplement our making details of all research freely available through publishing of pre-prints and technical reports on the website.

Interaction with end-users

Our targeted example application areas (statistics of large-scale directed networks in ecology, commerce, bibliometrics, genetic and genomics and infectious disease epidemiology) are chosen because of our collective track record of applied work in those areas which has made a substantial scientific impact. To catalyse dissemination to these areas, we will feed developments from our methodological work directly through to our current projects in those areas (for example through our current various grants in these areas from BBSRC, MRC, ESRC, EU FP7, NIHR, Wellcome Trust).

We will also use the Ilike grant to launch major application-focussed proposals (such as in Statistics Genetics to the MRC, or in Infectious Disease Epidemiology to the ERC), thus providing secondary benefits to user communities.

Publications

10 25 50
 
Description We have developed new generic statistical methodology for intractable likelihood problems which many applications. In particular we have developed approximate likelihood methods, adaptive Monte Carlo methods, retrospective simulation methods, composite likelihood and ABC methods, advanced and scalable Monte Carlo methods for intractable likelihood problems, and methods for carrying out inference under encryption. The application areas we consider are diverse, though we have made particular progress in application in genetics, genomics and infectious disease epidemiology.
Exploitation Route The potential application areas of our work are wide-ranging. However the problem of enabling scientists who are not expert in the statistical methodology we develop, to be able to use our work is challenging. As much as possible, we provide publicly-available software to facilitate this.
Sectors Agriculture, Food and Drink,Digital/Communication/Information Technologies (including Software),Energy,Financial Services, and Management Consultancy,Healthcare,Government, Democracy and Justice,Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology,Security and Diplomacy

URL http://www.i-like.org.uk
 
Description Bean Review of UK Economic Statistics
Geographic Reach National 
Policy Influence Type Gave evidence to a government review
URL https://www.gov.uk/government/publications/independent-review-of-uk-economic-statistics-final-report
 
Description Government Statistical Service, Methodology Advisory Committee
Geographic Reach National 
Policy Influence Type Participation in a advisory committee
 
Title BradleyTerryScalable (R package) 
Description An add-on package for R. Facilitates the use of Bradley-Terry models for ranking in large, directed networks. 
Type Of Technology Software 
Year Produced 2017 
Open Source License? Yes  
Impact None known as yet. 
URL https://cran.r-project.org/web/packages/BradleyTerryScalable/index.html
 
Title EncryptedStats - Statistical methods amenable to encrypted computation 
Description This package enables provides implementations of existing or novel statistical machine learning techniques implemented in a manner which is amenable to encrypted computation under a homomorphic encryption scheme supporting both addition and multiplication. The implementations are such that they can be run on both encrypted and unencrypted content, making use of the full operator overloading supported by the HomomorphicEncryption R package. 
Type Of Technology Software 
Year Produced 2015 
Open Source License? Yes  
Impact The public release was within the past 6 months, so impact is only starting to feed through. A startup company, Numerai, have been in contact and are exploring its use in financial forecasting. 
URL http://www.louisaslett.com/EncryptedStats/
 
Title HomomorphicEncryption - An R package for fully homomorphic encryption 
Description This package enables use of optimised implementations of homomorphic encryption schemes from the user friendly interactive high-level language R and offers completely transparent use of multi-core CPU architectures during computations. 
Type Of Technology Software 
Year Produced 2015 
Open Source License? Yes  
Impact The public release was within the past 6 months, so impact is only starting to feed through. I have been contacted by academics using the software to explore homomorphic encryption for the first time, becoming interested in the potential for its use in statistics and machine learning. A startup company, Numerai, have also been in contact and are exploring its use in sharing sensitive data to designers of proprietary algorithms, where both parties require privacy. 
URL http://www.louisaslett.com/HomomorphicEncryption/
 
Title Measuring influence in scientific fields 
Description A prototype web-based visualisation tool for methods developed to rank research fields, and journals within fields, taking account of both in-field and wider influence based upon the global citation network. 
Type Of Technology Webtool/Application 
Year Produced 2018 
Impact None known as yet. 
URL http://selbydavid.com/influence/
 
Title PlackettLuce (R package) 
Description An add-on package for R. Facilitates the use of Plackett-Luce models for ranking of multiple objects by multiple observers. 
Type Of Technology Software 
Year Produced 2017 
Open Source License? Yes  
Impact None known as yet. 
URL https://cran.r-project.org/web/packages/PlackettLuce/index.html
 
Title R package RZigZag: ZigZag Sampler 
Description Implements the Zig-Zag algorithm with subsampling and control variates (ZZ-CV) of (Bierkens, Fearnhead, Roberts, 2016) as applied to Bayesian logistic regression, as well as basic Zig-Zag for a Gaussian target distribution. 
Type Of Technology Software 
Year Produced 2017 
Open Source License? Yes  
Impact 887 downloads from CRAN since the package was released. 
URL https://CRAN.R-project.org/package=RZigZag
 
Title mlmc 
Description The first publicly available implementation of a full Multi-level Monte Carlo driver in the R language and the first driver in any language with built in multi core parallelism built in. The software includes classic MLMC example level samplers in both R and C++. 
Type Of Technology Software 
Year Produced 2016 
Open Source License? Yes  
Impact The software was made available at the end of 2016 and impacts are currently unknown. 
URL https://github.com/louisaslett/mlmc
 
Description Alternative football league table 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact A spin-off from my methodological work on ranking: this website/blog presents a novel, mathematically principled Premier League table that is more informative than the official league table. The website has been publicised via Twitter, and attracts hundreds of visitors already on a regular basis. A primary aim of this work is to promote the value of principled mathematics, through the medium of the world's most popular sport.
Year(s) Of Engagement Activity 2017
URL http://alt-3.uk
 
Description General Election 2017, BBC TV and radio appearances 
Form Of Engagement Activity A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Public/other audiences
Results and Impact In June 2017 I was approached by the BBC to make an online video and also to do a series of radio and TV interviews, about my work on election-day exit poll methodology (work that had taken place between 2004 and 2006). The video is globally visible at the URL shown. The TV interviews were nationally syndicated to BBC news programmes. This media exposure led to my being called upon by the House of Lords enquiry into electoral polling, to help frame their work during summer 2017. I regard it also as an important piece of public advocacy for the value of principled statistical methods in everyday life.
Year(s) Of Engagement Activity 2017
URL http://www.bbc.co.uk/news/av/uk-england-coventry-warwickshire-40175188/general-election-2017-the-sta...
 
Description Popular science alumni magazine article 
Form Of Engagement Activity A magazine, newsletter or online publication
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact A popular science article written for Corpus Christi College Oxford alumni magazine as a research showcase. There is an expression of interest from the Royal Statistical Society for this article to be expanded to a full article for society's official magazine, Significance, which is intended as an outreach publication.
Year(s) Of Engagement Activity 2015
URL http://www.louisaslett.com/dl/Aslett_LJM_DoingScienceBlindfold.pdf