Statistical Design of Experiments for Complex Nonparametric and Mechanistic Models
Lead Research Organisation:
University of Southampton
Department Name: Statistical Sciences Research institute
Abstract
Experiments are used to investigate the impact on an observed response of a set of controllable features (called factors or variables) of the system under study, and provide the basis of much important research in many areas of the physical sciences, engineering and industry. Design of experiments is concerned with selecting the combinations of factor values, or treatments, to be run to meet the aims of the experiment with best use of resource. These aims will usually involve learning about the unknown relationship between the response and the factors, and then building a statistical model to approximate this relationship. Such a model describes how changing the factor values affects the response and the nature of the uncertainty in the relationship arising from sources such as measurement error. Importantly, the model allows us to predict the response from the system for an unobserved treatment, and to quantify our uncertainty about the prediction.
This research programme aims to develop new methods of finding good designs under a variety of different assumptions about the type of statistical model to be estimated from the experiment data in the presence of complicated structures in the data collection process. There are three research themes and, for each, new designs will be found that allow us to learn efficiently and effectively about different types of statistical models. In the first theme, we have little prior scientific knowledge about the system, and hence we cannot specify a form of statistical model in advance of the experiment. In the second theme, designs will be found for experiments where, for each treatment, we observe a curve or surface, representing a function, rather than a single number. We then need to learn about the form of this function for each treatment, and how the functions vary from treatment to treatment. The third theme will consider systems where one or more scientific theories may provide a mathematical approximation to the responses of interest. Here, a design is needed to generate data that enables discrimination between the competing theories and understanding of the difference, or discrepancy, between scientific theory and the real system.
Methodological research in the design of experiments has been traditionally motivated by problems from science, medicine and engineering. The research on the three themes is motivated by experimental programmes from pharmaceutical development, engineering and dispersion science. The new designs found will be test-bedded in prototype experiments in these fields through interactions with project partners and scientific collaborators. These experiments will provide a valuable evaluation of the methods and demonstrate their effectiveness to user communities. The methods will also have wider impact across a range of sciences and industry where such experiments are required and where there are currently no designs available tailored to both the aims of the experiment and the methods to be used in the data analysis.
This research programme aims to develop new methods of finding good designs under a variety of different assumptions about the type of statistical model to be estimated from the experiment data in the presence of complicated structures in the data collection process. There are three research themes and, for each, new designs will be found that allow us to learn efficiently and effectively about different types of statistical models. In the first theme, we have little prior scientific knowledge about the system, and hence we cannot specify a form of statistical model in advance of the experiment. In the second theme, designs will be found for experiments where, for each treatment, we observe a curve or surface, representing a function, rather than a single number. We then need to learn about the form of this function for each treatment, and how the functions vary from treatment to treatment. The third theme will consider systems where one or more scientific theories may provide a mathematical approximation to the responses of interest. Here, a design is needed to generate data that enables discrimination between the competing theories and understanding of the difference, or discrepancy, between scientific theory and the real system.
Methodological research in the design of experiments has been traditionally motivated by problems from science, medicine and engineering. The research on the three themes is motivated by experimental programmes from pharmaceutical development, engineering and dispersion science. The new designs found will be test-bedded in prototype experiments in these fields through interactions with project partners and scientific collaborators. These experiments will provide a valuable evaluation of the methods and demonstrate their effectiveness to user communities. The methods will also have wider impact across a range of sciences and industry where such experiments are required and where there are currently no designs available tailored to both the aims of the experiment and the methods to be used in the data analysis.
Planned Impact
The statistical design of experiments is crucial to the discovery of new knowledge in many fields of science and engineering. Those working in fields involving advanced technology and experiment techniques will potentially benefit from the optimal designs and methods developed in the research programme. They will be able to obtain data tailored to sophisticated modelling techniques, including understanding of uncertainties, and appropriate to their scientific aims. Importantly, the designs will make best use of resources, leading to efficient experiments.
Many industries are concerned with using experimental programmes to develop and manufacture products and processes with consistently high performance They are potential beneficiaries of the research through the availability of the new efficient tailored designs. This benefit leads to reduced cost of materials and faster time-to-market, offering the long-term prospect of improving UK economic competitiveness. Project partner GlaxoSmithKline and other pharmaceutical companies may make such gains in the development and manufacturing of new drugs. Collaborations with the Optoelectronics Research Centre and the National Centre for Advanced Tribology will lead to potential impact in other industries including communications, where technologies in fibre optics are leading the broadband revolution, and energy and transport through the generation of new lubricant technologies.
The research has potential societal impact by underpinning scientific advances that improve the quality of life and healthcare in the UK through the faster development of medicines. Societal impact would also come from application by Dstl of the new techniques to develop improved methods for sequential virtual and physical experiments to understand chemical and biological dispersion, thereby decreasing the time taken to react to a terrorist attack.
Interaction with project partners and collaborators provide the opportunity to transfer new statistical methods and techniques to other scientists, leading to an increase in the knowledge and skill-base amongst scientists and statisticians working in the collaborating organisations and centres.
In all the above fields, experiments are typically complex and involve the measurement of responses which require advanced statistical modelling techniques. There are currently no generic optimal or highly efficient designs available that are tailored to these types of experiments. The proposed research will provide the first bespoke designs and methods for these fast moving application areas, and others with similar types of experimentation, and hence enable greater scientific understanding than currently possible.
Many industries are concerned with using experimental programmes to develop and manufacture products and processes with consistently high performance They are potential beneficiaries of the research through the availability of the new efficient tailored designs. This benefit leads to reduced cost of materials and faster time-to-market, offering the long-term prospect of improving UK economic competitiveness. Project partner GlaxoSmithKline and other pharmaceutical companies may make such gains in the development and manufacturing of new drugs. Collaborations with the Optoelectronics Research Centre and the National Centre for Advanced Tribology will lead to potential impact in other industries including communications, where technologies in fibre optics are leading the broadband revolution, and energy and transport through the generation of new lubricant technologies.
The research has potential societal impact by underpinning scientific advances that improve the quality of life and healthcare in the UK through the faster development of medicines. Societal impact would also come from application by Dstl of the new techniques to develop improved methods for sequential virtual and physical experiments to understand chemical and biological dispersion, thereby decreasing the time taken to react to a terrorist attack.
Interaction with project partners and collaborators provide the opportunity to transfer new statistical methods and techniques to other scientists, leading to an increase in the knowledge and skill-base amongst scientists and statisticians working in the collaborating organisations and centres.
In all the above fields, experiments are typically complex and involve the measurement of responses which require advanced statistical modelling techniques. There are currently no generic optimal or highly efficient designs available that are tailored to these types of experiments. The proposed research will provide the first bespoke designs and methods for these fast moving application areas, and others with similar types of experimentation, and hence enable greater scientific understanding than currently possible.
Organisations
- University of Southampton (Fellow, Lead Research Organisation, Project Partner)
- Defence Science & Technology Laboratory (DSTL) (Collaboration)
- PUBLIC HEALTH ENGLAND (Collaboration)
- Atomic Weapons Establishment (Collaboration)
- GlaxoSmithKline (GSK) (Collaboration)
- Defence Science and Technology Laboratory (Project Partner)
- GlaxoSmithKline (United Kingdom) (Project Partner)
People |
ORCID iD |
David Woods (Principal Investigator / Fellow) |
Publications

Atkinson Anthony C.
(2015)
Designs for Generalized Linear Models
in arXiv e-prints

Bowman V
(2017)
Weighted space-filling designs
in Journal of Simulation

Bowman V
(2016)
Emulation of Multivariate Simulators Using Thin-Plate Splines with Application to Atmospheric Dispersion
in SIAM/ASA Journal on Uncertainty Quantification

Draguljic D
(2014)
Screening Strategies in the Presence of Interactions
in Technometrics

Englezou Y
(2022)
Approximate Laplace importance sampling for the estimation of expected Shannon information gain in high-dimensional Bayesian design for nonlinear models
in Statistics and Computing

Fisher, V.A.
(2013)
Optimal design for prediction using local linear regression and the DSI-criterion
in Statistics and Applications

Lendrem DW
(2015)
Lost in space: design of experiments and scientific exploration in a Hogarth Universe.
in Drug discovery today

Overstall A
(2019)
Bayesian Optimal Design for Ordinary Differential Equation Models With Application in Biological Science
in Journal of the American Statistical Association

Overstall A
(2017)
Bayesian Design of Experiments Using Approximate Coordinate Exchange
in Technometrics

Overstall A
(2019)
Bayesian prediction for physical models with application to the optimization of the synthesis of pharmaceutical products using chemical kinetics
in Computational Statistics & Data Analysis
Description | This project generated new statistical methods for the design and analysis of complex experiments, implemented them in computer code, and applied the methods to substantive applications in key areas of science and technology. In particular, we developed new methods for the optimal design of experiments under (i) functional data, where either the response or one or more independent factors vary continuously, e.g. via continuous response measurement or as a function of time; and (ii) nonparametric regression, e.g. the Gaussian process model, where minimal assumptions are made about the relationship linking the mean response to the explanatory variables. In both these cases, our methods have been applied in the pharmaceutical industry. Another major output from the project is new methods and software for optimal Bayesian design of experiments experiments. We provided the first general purpose methods for designing high-dimensional experiments under the Bayesian paradigm, overcoming some substantial and long-standing computational hurdles. This research has been implemented in a freely available and open source R package. |
Exploitation Route | The new methods have already been applied in a number of active projects by GlaxoSmithKline; discussions are ongoing on how to promote and sustain their use within the company, and the wider pharmaceutical industry. New collaborations with organisations such as AWE and PHE will extend the range of application areas where these methods will have impact. To date, the major academic impact has been through our work on Bayesian optimal design, which has inspired research by a number of other international groups, for example in Canada and Australia. Our methods have quickly become the "gold standard" in this area with which new developments are compared. |
Sectors | Aerospace, Defence and Marine,Chemicals,Electronics,Energy,Environment,Healthcare,Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology |
Description | Collaborators in the pharmaceutical industry have used methodology from this project to design and analyse experiments to advance (i) formulated medicines, (ii) inhaled medicine products, and (iii) biopharmaceutical development. Through active collaboration with GlaxoSmithKline, my group has directly provided statistical support for the design and analysis of experiments using our new methodology for, e.g., optimal design for nonparametric regression and optimal design for functional data. We have also provided tools, code, and reports to ensure sustainable technology transfer to the organisation. Research from this Fellowship has also directly impacted subsequent work with the Defence Science and Technology Laboratory and Public Health England. Funded by a contract from the US government, we have extended and applied our methods for the design and modelling of computer experiments to complex epidemiological applications. These include building statistical approximations to computer models for anthrax, flu and Covid-19. Our methodology is being implemented in a user-friendly software system for use in the US and UK, and we were able to assist in the quantitative analysis of Covid-19 modelling being undertaken by Dstl. |
First Year Of Impact | 2014 |
Sector | Pharmaceuticals and Medical Biotechnology |
Impact Types | Societal,Economic |
Description | Active Learning for Computational Polymorph Landscape Analysis |
Amount | £251,033 (GBP) |
Funding ID | EP/S015418/1 |
Organisation | Engineering and Physical Sciences Research Council (EPSRC) |
Sector | Public |
Country | United Kingdom |
Start | 09/2018 |
End | 04/2020 |
Description | Chemobots: Digital-Chemical-Robotics to Convert Code to Molecules and Complex Systems |
Amount | £5,034,016 (GBP) |
Funding ID | EP/S019472/1 |
Organisation | Engineering and Physical Sciences Research Council (EPSRC) |
Sector | Public |
Country | United Kingdom |
Start | 02/2019 |
End | 01/2024 |
Description | Closed loop optimisation for sustainable chemical manufacture |
Amount | £973,523 (GBP) |
Funding ID | EP/L003309/1 |
Organisation | Engineering and Physical Sciences Research Council (EPSRC) |
Sector | Public |
Country | United Kingdom |
Start | 09/2013 |
End | 12/2015 |
Description | Combining Chemical Robotics and Statistical Methods to Discover Complex Functional Products |
Amount | £1,227,510 (GBP) |
Funding ID | EP/R009902/1 |
Organisation | Engineering and Physical Sciences Research Council (EPSRC) |
Sector | Public |
Country | United Kingdom |
Start | 03/2018 |
End | 04/2021 |
Description | Defence Threat Reduction Agency Basic Research Grant |
Amount | $3,400,000 (USD) |
Organisation | Defense Threat Reduction Agency |
Sector | Public |
Country | United States |
Start | 11/2017 |
End | 11/2020 |
Description | Design, modelling and analysis for longitudinal population studies involving high-dimensional molecular measurements |
Amount | £192,246 (GBP) |
Funding ID | 217068/Z/19/Z |
Organisation | Wellcome Trust |
Sector | Charity/Non Profit |
Country | United Kingdom |
Start | 09/2019 |
End | 10/2022 |
Description | EPSRC Institutional Sponsorship for research collaboration |
Amount | £31,447 (GBP) |
Organisation | Engineering and Physical Sciences Research Council (EPSRC) |
Sector | Public |
Country | United Kingdom |
Start | 06/2016 |
End | 03/2017 |
Description | EPSRC responsive mode |
Amount | £1,220,904 (GBP) |
Funding ID | EP/R009902/1 |
Organisation | Engineering and Physical Sciences Research Council (EPSRC) |
Sector | Public |
Country | United Kingdom |
Start | 02/2018 |
End | 01/2021 |
Description | GlaxoSmithKline Product Development Collaboration funding |
Amount | £66,000 (GBP) |
Organisation | GlaxoSmithKline (GSK) |
Sector | Private |
Country | Global |
Start | 12/2015 |
End | 03/2017 |
Description | Knowledge Transfer Secondment |
Amount | £12,000 (GBP) |
Organisation | GlaxoSmithKline (GSK) |
Sector | Private |
Country | Global |
Start | 09/2014 |
End | 09/2015 |
Description | Royal Society International Exchange Scheme |
Amount | £8,760 (GBP) |
Organisation | The Royal Society |
Sector | Charity/Non Profit |
Country | United Kingdom |
Start | 03/2016 |
End | 03/2018 |
Description | AWE |
Organisation | Atomic Weapons Establishment |
Country | United Kingdom |
Sector | Private |
PI Contribution | Provided access to new methods for (i) modelling multivariate (e.g. spatial) outputs from complex computer simulators, and (ii) designing experiments to calibrate computationally expensive simulators. |
Collaborator Contribution | Problem definition and access to computer models/data |
Impact | None as yet. One PhD project expected in 2018, with papers to follow. |
Start Year | 2017 |
Description | Dstl |
Organisation | Defence Science & Technology Laboratory (DSTL) |
Country | United Kingdom |
Sector | Public |
PI Contribution | Collaborations with Dstl have focused on computer experiments and uncertainty quantification. Methods have been developed and applied for both statistical design and modelling for atmospheric dispersion computer models |
Collaborator Contribution | Dstl provide the scientific challenge and subject area expertise, and collaborate on the statistical research. |
Impact | Bowman & Woods (2013). Weighted space-filling designs. Journal of Simulation, 7, 249-263. |
Description | GSK |
Organisation | GlaxoSmithKline (GSK) |
Country | Global |
Sector | Private |
PI Contribution | A variety of research projects and impact activities have taken place with GSK, including the development and application of (i) sequential design of experiments methods for nonparametric regression, (ii) statistical design and modelling methods for screening experiments, and (iii) Bayesian design and modelling methods for multi-stage split-plot experiments. |
Collaborator Contribution | GSK have provided industrial problems, pharmaceutical chemical science expertise and feedback on implemented methods. |
Impact | No published outputs to date. |
Start Year | 2011 |
Description | PHE |
Organisation | Public Health England |
Country | United Kingdom |
Sector | Public |
PI Contribution | Partners on a new funded project with Dstl, supported by the US government. My group will provide methodology for the rapid statistical emulation and calibration of disease models. |
Collaborator Contribution | Provision of models and related expertise |
Impact | This project has just started, and there are not outcomes to report as yet. |
Start Year | 2017 |
Title | R package acebayes |
Description | AN R package to find optimal Bayesian designs, implementing the methods in Overstall and Woods (2017, Technometrics) |
Type Of Technology | Software |
Year Produced | 2016 |
Open Source License? | Yes |
Impact | Provides the first implementation of general methods for finding Bayesian optimal designs for multi-factor experiments. |
URL | https://cran.r-project.org/web/packages/acebayes/index.html |