Statistical Design of Experiments for Complex Nonparametric and Mechanistic Models

Lead Research Organisation: University of Southampton
Department Name: Statistical Sciences Research institute

Abstract

Experiments are used to investigate the impact on an observed response of a set of controllable features (called factors or variables) of the system under study, and provide the basis of much important research in many areas of the physical sciences, engineering and industry. Design of experiments is concerned with selecting the combinations of factor values, or treatments, to be run to meet the aims of the experiment with best use of resource. These aims will usually involve learning about the unknown relationship between the response and the factors, and then building a statistical model to approximate this relationship. Such a model describes how changing the factor values affects the response and the nature of the uncertainty in the relationship arising from sources such as measurement error. Importantly, the model allows us to predict the response from the system for an unobserved treatment, and to quantify our uncertainty about the prediction.

This research programme aims to develop new methods of finding good designs under a variety of different assumptions about the type of statistical model to be estimated from the experiment data in the presence of complicated structures in the data collection process. There are three research themes and, for each, new designs will be found that allow us to learn efficiently and effectively about different types of statistical models. In the first theme, we have little prior scientific knowledge about the system, and hence we cannot specify a form of statistical model in advance of the experiment. In the second theme, designs will be found for experiments where, for each treatment, we observe a curve or surface, representing a function, rather than a single number. We then need to learn about the form of this function for each treatment, and how the functions vary from treatment to treatment. The third theme will consider systems where one or more scientific theories may provide a mathematical approximation to the responses of interest. Here, a design is needed to generate data that enables discrimination between the competing theories and understanding of the difference, or discrepancy, between scientific theory and the real system.

Methodological research in the design of experiments has been traditionally motivated by problems from science, medicine and engineering. The research on the three themes is motivated by experimental programmes from pharmaceutical development, engineering and dispersion science. The new designs found will be test-bedded in prototype experiments in these fields through interactions with project partners and scientific collaborators. These experiments will provide a valuable evaluation of the methods and demonstrate their effectiveness to user communities. The methods will also have wider impact across a range of sciences and industry where such experiments are required and where there are currently no designs available tailored to both the aims of the experiment and the methods to be used in the data analysis.

Planned Impact

The statistical design of experiments is crucial to the discovery of new knowledge in many fields of science and engineering. Those working in fields involving advanced technology and experiment techniques will potentially benefit from the optimal designs and methods developed in the research programme. They will be able to obtain data tailored to sophisticated modelling techniques, including understanding of uncertainties, and appropriate to their scientific aims. Importantly, the designs will make best use of resources, leading to efficient experiments.

Many industries are concerned with using experimental programmes to develop and manufacture products and processes with consistently high performance They are potential beneficiaries of the research through the availability of the new efficient tailored designs. This benefit leads to reduced cost of materials and faster time-to-market, offering the long-term prospect of improving UK economic competitiveness. Project partner GlaxoSmithKline and other pharmaceutical companies may make such gains in the development and manufacturing of new drugs. Collaborations with the Optoelectronics Research Centre and the National Centre for Advanced Tribology will lead to potential impact in other industries including communications, where technologies in fibre optics are leading the broadband revolution, and energy and transport through the generation of new lubricant technologies.

The research has potential societal impact by underpinning scientific advances that improve the quality of life and healthcare in the UK through the faster development of medicines. Societal impact would also come from application by Dstl of the new techniques to develop improved methods for sequential virtual and physical experiments to understand chemical and biological dispersion, thereby decreasing the time taken to react to a terrorist attack.

Interaction with project partners and collaborators provide the opportunity to transfer new statistical methods and techniques to other scientists, leading to an increase in the knowledge and skill-base amongst scientists and statisticians working in the collaborating organisations and centres.

In all the above fields, experiments are typically complex and involve the measurement of responses which require advanced statistical modelling techniques. There are currently no generic optimal or highly efficient designs available that are tailored to these types of experiments. The proposed research will provide the first bespoke designs and methods for these fast moving application areas, and others with similar types of experimentation, and hence enable greater scientific understanding than currently possible.

Publications

10 25 50
 
Description This project generated new statistical methods for the design and analysis of complex experiments, implemented them in computer code, and applied the methods to substantive applications in key areas of science and technology. In particular, we developed new methods for the optimal design of experiments under (i) functional data, where either the response or one or more independent factors vary continuously, e.g. via continuous response measurement or as a function of time; and (ii) nonparametric regression, e.g. the Gaussian process model, where minimal assumptions are made about the relationship linking the mean response to the explanatory variables. In both these cases, our methods have been applied in the pharmaceutical industry. Another major output from the project is new methods and software for optimal Bayesian design of experiments experiments. We provided the first general purpose methods for designing high-dimensional experiments under the Bayesian paradigm, overcoming some substantial and long-standing computational hurdles. This research has been implemented in a freely available and open source R package.
Exploitation Route The new methods have already been applied in a number of active projects by GlaxoSmithKline; discussions are ongoing on how to promote and sustain their use within the company, and the wider pharmaceutical industry. New collaborations with organisations such as AWE and PHE will extend the range of application areas where these methods will have impact. To date, the major academic impact has been through our work on Bayesian optimal design, which has inspired research by a number of other international groups, for example in Canada and Australia. Our methods have quickly become the "gold standard" in this area with which new developments are compared.
Sectors Aerospace, Defence and Marine,Chemicals,Electronics,Energy,Environment,Healthcare,Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology

 
Description Collaborators in the pharmaceutical industry have used methodology from this project to design and analyse experiments to advance (i) formulated medicines, (ii) inhaled medicine products, and (iii) biopharmaceutical development. Through active collaboration with GlaxoSmithKline, my group has directly provided statistical support for the design and analysis of experiments using our new methodology for, e.g., optimal design for nonparametric regression and optimal design for functional data. We have also provided tools, code, and reports to ensure sustainable technology transfer to the organisation. Research from this Fellowship has also directly impacted subsequent work with the Defence Science and Technology Laboratory and Public Health England. Funded by a contract from the US government, we have extended and applied our methods for the design and modelling of computer experiments to complex epidemiological applications. These include building statistical approximations to computer models for anthrax, flu and Covid-19. Our methodology is being implemented in a user-friendly software system for use in the US and UK, and we were able to assist in the quantitative analysis of Covid-19 modelling being undertaken by Dstl.
First Year Of Impact 2014
Sector Pharmaceuticals and Medical Biotechnology
Impact Types Societal,Economic

 
Description Active Learning for Computational Polymorph Landscape Analysis
Amount £251,033 (GBP)
Funding ID EP/S015418/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 10/2018 
End 04/2020
 
Description Chemobots: Digital-Chemical-Robotics to Convert Code to Molecules and Complex Systems
Amount £5,034,016 (GBP)
Funding ID EP/S019472/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 02/2019 
End 01/2024
 
Description Closed loop optimisation for sustainable chemical manufacture
Amount £973,523 (GBP)
Funding ID EP/L003309/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 10/2013 
End 12/2015
 
Description Combining Chemical Robotics and Statistical Methods to Discover Complex Functional Products
Amount £1,227,510 (GBP)
Funding ID EP/R009902/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 04/2018 
End 04/2021
 
Description Defence Threat Reduction Agency Basic Research Grant
Amount $3,400,000 (USD)
Organisation Defense Threat Reduction Agency 
Sector Public
Country United States
Start 11/2017 
End 11/2020
 
Description Design, modelling and analysis for longitudinal population studies involving high-dimensional molecular measurements
Amount £192,246 (GBP)
Funding ID 217068/Z/19/Z 
Organisation Wellcome Trust 
Sector Charity/Non Profit
Country United Kingdom
Start 10/2019 
End 10/2022
 
Description EPSRC Institutional Sponsorship for research collaboration
Amount £31,447 (GBP)
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 07/2016 
End 03/2017
 
Description EPSRC responsive mode
Amount £1,220,904 (GBP)
Funding ID EP/R009902/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 02/2018 
End 01/2021
 
Description GlaxoSmithKline Product Development Collaboration funding
Amount £66,000 (GBP)
Organisation GlaxoSmithKline (GSK) 
Sector Private
Country Global
Start 12/2015 
End 03/2017
 
Description Knowledge Transfer Secondment
Amount £12,000 (GBP)
Organisation GlaxoSmithKline (GSK) 
Sector Private
Country Global
Start 10/2014 
End 09/2015
 
Description Royal Society International Exchange Scheme
Amount £8,760 (GBP)
Organisation The Royal Society 
Sector Charity/Non Profit
Country United Kingdom
Start 04/2016 
End 03/2018
 
Description AWE 
Organisation Atomic Weapons Establishment
Country United Kingdom 
Sector Private 
PI Contribution Provided access to new methods for (i) modelling multivariate (e.g. spatial) outputs from complex computer simulators, and (ii) designing experiments to calibrate computationally expensive simulators.
Collaborator Contribution Problem definition and access to computer models/data
Impact None as yet. One PhD project expected in 2018, with papers to follow.
Start Year 2017
 
Description Dstl 
Organisation Defence Science & Technology Laboratory (DSTL)
Country United Kingdom 
Sector Public 
PI Contribution Collaborations with Dstl have focused on computer experiments and uncertainty quantification. Methods have been developed and applied for both statistical design and modelling for atmospheric dispersion computer models
Collaborator Contribution Dstl provide the scientific challenge and subject area expertise, and collaborate on the statistical research.
Impact Bowman & Woods (2013). Weighted space-filling designs. Journal of Simulation, 7, 249-263.
 
Description GSK 
Organisation GlaxoSmithKline (GSK)
Country Global 
Sector Private 
PI Contribution A variety of research projects and impact activities have taken place with GSK, including the development and application of (i) sequential design of experiments methods for nonparametric regression, (ii) statistical design and modelling methods for screening experiments, and (iii) Bayesian design and modelling methods for multi-stage split-plot experiments.
Collaborator Contribution GSK have provided industrial problems, pharmaceutical chemical science expertise and feedback on implemented methods.
Impact No published outputs to date.
Start Year 2011
 
Description PHE 
Organisation Public Health England
Country United Kingdom 
Sector Public 
PI Contribution Partners on a new funded project with Dstl, supported by the US government. My group will provide methodology for the rapid statistical emulation and calibration of disease models.
Collaborator Contribution Provision of models and related expertise
Impact This project has just started, and there are not outcomes to report as yet.
Start Year 2017
 
Title R package acebayes 
Description AN R package to find optimal Bayesian designs, implementing the methods in Overstall and Woods (2017, Technometrics) 
Type Of Technology Software 
Year Produced 2016 
Open Source License? Yes  
Impact Provides the first implementation of general methods for finding Bayesian optimal designs for multi-factor experiments. 
URL https://cran.r-project.org/web/packages/acebayes/index.html