Improving Experimentation and Measurement for Online Products and Services

Lead Research Organisation: Imperial College London
Department Name: Mathematics

Abstract

The value of making data driven or data informed decisions has become increasingly clear in recent years. Key to making data driven decisions is the ability to accurately measure the impact of a given choice and to experiment with possible alternatives. We define Experimentation & Measurement (E&M) capabilities as the knowledge and tools necessary to run experiments (controlled or otherwise) with different digital or physical products, services, or experiences, and measure their impact. The capabilities may be in the form of an online controlled experiment framework, a team of econometrics analysts, or a system capable of performing machine learning-aided causal inference---the understanding of the cause and effect based on what one observed.

The research project aims to develop an array of statistical and machine learning methods, to boost the E&M capabilities of online businesses. So far, we have successfully (1) estimated the value of E&M capabilities themselves and (2) built an evaluation framework for controlled experiment designs, and seek to:

(3) Develop data-driven controlled experiment designs:
The ability to run many large-scale controlled experiments on the Web allows us to collect data generated by similar experiments in the past. Can we leverage these data to shorten the duration of an experiment if it yields results on a similar trend?

(4) Quantify the measurement uncertainty of observational studies a priori:
Observational studies (experiments w/o a control group) allow one to estimate both the direction and magnitude of the impact of an action/attribute using causal inference. Unlike its controlled counterpart, the uncertainty level around an estimate is often known only after running the analysis. Can we quantify the bounds on estimates produced from such analyses before running them, just like how we do so for controlled experiments?

(5) Combine insights generated from controlled and observational experiments:
Unlike in experimental science, where data are generally collected for a specific purpose, data on the Web are often used for purposes other than that originally intended. Can we do the same with experiments, i.e. supplement observational studies with data generated from controlled experiments that were designed for other purposes (or vice versa)?

The scope of the research project is ambitious. Many of the challenges (topics 1, 2, and 5) had received little consideration or serious attempts to the best of our knowledge due to the need to draw the latest results in multiple related fields. Other problems require building on the state-of-the-art in Bayesian Hypothesis Testing (topic 3) and Extreme Value Theory (topics 2 and 4). For the former we plan to combine Data-driven Priors, the specification on how a new experiment could possibly perform based on how previous experiments performed, and Non-local Priors, the specification on how an experiment could possibly perform that does not contradict the null hypothesis, to make an experiment more sensitive to any changes in a business metric. For the latter we seek to reduce the uncertainty level of the estimates produced in the face of heavy-tailed distributed responses, i.e. an extremely wide range of online behaviour that can be characterised using the 80/20 rule.

The project falls within the EPSRC Digital Economy Theme, touching on sub-themes including Behavioural research; Data, information and knowledge; and Value creation and capture. It is carried out in collaboration with, and part-funded by, ASOS.com, one of the largest UK-based online fashion retailers. The academic-industry collaboration enables the research to access the wealth of what would be propriety data that informs the development of measurement methods, and the business to benefit from insights and techniques developed immediately, generating new data along the way for iterative development.

Planned Impact

The primary CDT impact will be training 75 PhD graduates as the next generation of leaders in statistics and statistical machine learning. These graduates will lead in industry, government, health care, and academic research. They will bridge the gap between academia and industry, resulting in significant knowledge transfer to both established and start-up companies. Because this cohort will also learn to mentor other researchers, the CDT will ultimately address a UK-wide skills gap. The students will also be crucial in keeping the UK at the forefront of methodological research in statistics and machine learning.
After graduating, students will act as multipliers, educating others in advanced methodology throughout their career. There are a range of further impacts:
- The CDT has a large number of high calibre external partners in government, health care, industry and science. These partnerships will catalyse immediate knowledge transfer, bringing cutting edge methodology to a large number of areas. Knowledge transfer will also be achieved through internships/placements of our students with users of statistics and machine learning.
- Our Women in Mathematics and Statistics summer programme is aimed at students who could go on to apply for a PhD. This programme will inspire the next generation of statisticians and also provide excellent leadership training for the CDT students.
- The students will develop new methodology and theory in the domains of statistics and statistical machine learning. It will be relevant research, addressing the key questions behind real world problems. The research will be published in the best possible statistics journals and machine learning conferences and will be made available online. To maximize reproducibility and replicability, source code and replication files will be made available as open source software or, when relevant to an industrial collaboration, held as a patent or software copyright.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/S023151/1 01/04/2019 30/09/2027
2284224 Studentship EP/S023151/1 01/10/2019 30/09/2023 Chak Hin Bryan Liu
 
Description Digital experimentation and measurement is the application of experimental design and causal inference techniques for digital organisations to experiment with different products (goods), services, or experiences, and measure their impact. It has gained traction in the past two decades, with the largest tech companies having reported running thousands of experiments at any given time and multiple businesses whose sole purpose is to manage experiments for others.

This work funded through this award has addressed challenges faced by digital organisations as they progress in experimentation maturity. These include answering the following questions:
* Why should one engage in digital experimentation and measurement?
* What ingredients do we require to run experiments successfully in a digital setting?
* When would an experiment design outperform another?

So far, we have looked into why an organisation should invest in digital Experimentation and Measurement (E&M) capabilities - the knowledge and tools necessary to run experiments with different digital products, services, or experiences, and measure their impact. We tackle this problem by analysing how such capabilities decrease the level of uncertainty when estimating the value of digital products and services, and thus enable better prioritisation. We quantify the benefit of better prioritisation in terms of expected improvement in the performance of the selected digital products and services. This allows us to provide guidance for when organisations should invest in an E&M capability.

We have also collated existing statistical tests and datasets in the digital experimentation space and present them in a systematic manner. Crucially, we map the link between these two classes of objects, enabling researchers and practitioners to quickly identify the data collection requirements for their experiment design, and conversely the statistical test options available given the data availability. In other words, it helps answer the questions "I need to run this statistical test, how should I format my dataset?" and "I have this dataset, what statistical tests can I run?" much more quickly.

Moreover, we have built an evaluation framework for experiments that compare personalization strategies, complex sets of targeted customer interactions that are common in e-commerce and digital marketing. These strategies can be the scheduling, budgeting, and ordering of marketing activities directed at a user based on their purchase history. Along with a few simple rules of thumb, the framework allows experimenters to quickly compare which experiment setup will lead to the highest probability of showing what an organisation is testing is indeed working under their particular experimental setting.
Exploitation Route The project is jointly funded by EPSRC and ASOS.com, a prominent online fashion retailer in the UK, and thus the problems are motivated by challenges commonly faced by digital organisations in all sectors.
The research works described have already been used:
1) to build business cases for further investment in experimentation & measurement capabilities in multiple organisations (in multiple countries and sectors),
2) to compare designs for actual experiments run online, and
3) as a resource for A/B testing datasets in various tertiary-level analytics and data science courses around the world.
Sectors Digital/Communication/Information Technologies (including Software)

 
Description The research output have already been used: 1) to build business cases for further investment in experimentation & measurement capabilities in multiple organisations (in multiple countries and sectors), 2) to compare designs for actual experiments run online, and 3) as a resource for A/B testing datasets in various tertiary-level analytics and data science courses around the world. Supporting notes & materials: Item 1. A) Presentation deck from Miotsukushi Analytics Inc. in Japan (unaffiliated) discussing how the valuation model described in the paper "What is the value of experimentation and measurement?" (Liu et al., 2020) could be applied to digital gaming organisations: https://speakerdeck.com/uvalue/what-is-the-value-of-experimentation-and-measurement Item 1. B) The general model described in the paper "What is the value of experimentation and measurement?" (Liu et al., 2020) has been adapted to fit the specific scenario at ASOS.com and presented to the CTO of the company. The materials presented are behind an NDA and thus we are unable to provide a link. Item 3. The ASOS Digital Experiments Dataset published together with the paper "Datasets for Online Controlled Experiments" (Liu et al. 2021) has been used by a research team in Amazon to demonstrate and refine their method on predicting how many users will enter a digital experiment. The said method is currently in production (i.e. day-to-day use) within Amazon to improve customer experience and operational efficiency. See: https://assets.amazon.science/ca/02/d38daae64c5f8eb91637f2f12db5/a-bayesian-model-for-online-activity-sample-sizes.pdf
First Year Of Impact 2020
Sector Digital/Communication/Information Technologies (including Software)
Impact Types Economic

 
Title ASOS Digital Experiments Dataset 
Description A novel dataset that can support the end-to-end design and running of Online Controlled Experiments (OCE) with adaptive stopping. 
Type Of Material Database/Collection of data 
Year Produced 2021 
Provided To Others? Yes  
Impact Used by unaffiliated research group in Amazon to demonstrate their method on predicting sample size in online experimentation, which is due to be presented in AISTATS 2022 - Richardson et al. (2022) A Bayesian Model for Online Activity Sample Sizes. In: Proceedings of The 25th International Conference on Artificial Intelligence and Statistics (AISTATS 2022). 
URL https://osf.io/64jsb/