# MICA: Clinical trial estimands: from definition to estimation

Lead Research Organisation:
University of Bath

Department Name: Mathematical Sciences

### Abstract

Randomised clinical trials represent the gold standard approach for testing whether new treatments for diseases work better than existing treatments and quantifying the magnitude of the benefit. In principle the analysis of such trials is simple - one compares the chosen outcome measure of patients in one group with the patients in the other group. In practice a number of complications may arise which make this comparison difficult to interpret or impossible even to calculate. One example is trials in which patients may change from the treatment that they were randomly assigned to receive during the follow-up period, either to the alternative treatment, no treatment at all, or they may start taking additional treatment(s). A second example is in trials which aim to compare (for example) cholesterol treatments in terms of their effects on death due to cardiovascular disease. This comparison is complicated by the fact that some patients may die of other causes, such as cancer. In a simple analysis comparing the number of patients who died due to cardiovascular disease between the two groups, a new treatment could for example reduce the chances of death due to cardiovascular disease, but only by virtue of the fact it increases death due to cancer. A third example is trials in cancer where interest lies in comparing treatments both in terms of their ability to prevent cancer recurrence and in terms of their adverse side effects, which may impact on the patient's quality of life. Any comparison of the treatments' effects on patient quality of life measures is complicated by the fact that inevitably such measures will be unavailable for some patients in each treatment group because they have died.

In the context of such issues, in recent years there has been increased scrutiny from drug regulatory agencies regarding how clinical trials specify how they will handle such complications in their design and statistical analysis. Specifically, there is an increased demand for trials to clearly specify exactly what kind of effect of treatment they seek to quantify (the so called estimand) and to choose a method of statistical analysis that handles these issues in a sensible and plausible manner.

The aim of this research is to investigate how such complications can best be handled using concepts and methods developed in the field of so called 'causal inference theory'. This theory offers a mathematical language to precisely describe what we mean by the effect of treatment in the presence of complicating factors such as the ones described earlier. Moreover, a large range of statistical methods have been developed for estimating treatment effects defined using these concepts, under different assumptions. This research will use causal inference theory to precisely define treatment effects (estimands) in the presence of the various issues described earlier. It will then investigate which statistical methods developed in causal inference theory are best suited for application to the analysis of clinical trial data.

The outputs of this research will help statisticians involved in clinical trials to use causal inference concepts and language to clearly specify the treatment effect which their trial intends to estimate. It will give them guidance and recommendations as to which statistical methods they can use to estimate such effects. The research will also produce software to implement the new statistical methods to enable trial statisticians to use the methods in their trials. Together these outputs will mean that patients can be offered more meaningful and accurate measures of expected treatment effects and that clinicians can make more informed decisions about patient care. The research will enable drug regulators and payer authorities to make fairer comparisons between treatments in regards their efficacy, safety, and cost-effectiveness, leading to improved decisions about which treatments to license and make available to patients.

In the context of such issues, in recent years there has been increased scrutiny from drug regulatory agencies regarding how clinical trials specify how they will handle such complications in their design and statistical analysis. Specifically, there is an increased demand for trials to clearly specify exactly what kind of effect of treatment they seek to quantify (the so called estimand) and to choose a method of statistical analysis that handles these issues in a sensible and plausible manner.

The aim of this research is to investigate how such complications can best be handled using concepts and methods developed in the field of so called 'causal inference theory'. This theory offers a mathematical language to precisely describe what we mean by the effect of treatment in the presence of complicating factors such as the ones described earlier. Moreover, a large range of statistical methods have been developed for estimating treatment effects defined using these concepts, under different assumptions. This research will use causal inference theory to precisely define treatment effects (estimands) in the presence of the various issues described earlier. It will then investigate which statistical methods developed in causal inference theory are best suited for application to the analysis of clinical trial data.

The outputs of this research will help statisticians involved in clinical trials to use causal inference concepts and language to clearly specify the treatment effect which their trial intends to estimate. It will give them guidance and recommendations as to which statistical methods they can use to estimate such effects. The research will also produce software to implement the new statistical methods to enable trial statisticians to use the methods in their trials. Together these outputs will mean that patients can be offered more meaningful and accurate measures of expected treatment effects and that clinicians can make more informed decisions about patient care. The research will enable drug regulators and payer authorities to make fairer comparisons between treatments in regards their efficacy, safety, and cost-effectiveness, leading to improved decisions about which treatments to license and make available to patients.

### Technical Summary

In clinical trials a number of complications may arise which mean that a range of different types of treatment effect, or estimand, could be contemplated as the target of inference. For example, in trials where patients change treatments during follow-up (either as intended by the protocol or not), scientific interest may lie in estimation of the effects of a so called treatment regime which either was not adhered to in the trial or differs (e.g. for ethical reasons) from the trial's intended treatment regime. A second example is when the trial aims to assess effects on time to an event of interest but patients may experience a competing event first. A third example is where the trial aims to compare treatments' effects on some outcome but this outcome cannot be observed in a non-trivial proportion of patients because they have died (truncation by death). The recognition of the importance of clear specification of the estimand and appropriate choice of statistical method is evidenced by the commissioning of the ICH E9 addendum on clinical trial estimands.

The aim of this research is to apply the concepts and statistical estimation methods developed in modern causal inference theory to the problem of estimand definition and estimation in randomised trials. For a range of important estimand problems, we will use counterfactual language to precisely specify the estimand mathematically. We will use causal inference theory to characterize under what conditions such estimands can be identified or estimated from the observed data. Lastly we will critically evaluate, analytically, through simulation, and application to exemplar trial datasets, the different statistical methods which have been developed to estimate such estimands in regards the plausibility of their assumptions and their statistical properties. Open source software in R will be developed implementing these methods where implementations currently do not exist, with accompanying tutorials.

The aim of this research is to apply the concepts and statistical estimation methods developed in modern causal inference theory to the problem of estimand definition and estimation in randomised trials. For a range of important estimand problems, we will use counterfactual language to precisely specify the estimand mathematically. We will use causal inference theory to characterize under what conditions such estimands can be identified or estimated from the observed data. Lastly we will critically evaluate, analytically, through simulation, and application to exemplar trial datasets, the different statistical methods which have been developed to estimate such estimands in regards the plausibility of their assumptions and their statistical properties. Open source software in R will be developed implementing these methods where implementations currently do not exist, with accompanying tutorials.

### Planned Impact

Statisticians working on the design and analysis of clinical trials will benefit from this research. This will include statisticians both within the pharmaceutical industry and statisticians working on publicly funded trials at universities, medical schools, and other institutions.

The peer reviewed journal publications resulting from this research will help trial statisticians use causal inference concepts and language to choose and more clearly specify the target estimand(s) in their trials. The ICH E9 addendum on estimands is expected to lead an increased emphasis on trial protocol documents clearly specifying the trial's estimand(s), and so the outputs from this research will influence the content of these sections of protocols.

The publications and tutorials resulting from the research will guide trial statisticians as to which statistical methods can be used to estimate their chosen estimands. The outputs should thus influence the choice of statistical analysis and how these are described in statistical analysis plans. The research should help make clear why a particular statistical method has been chosen given the trial's targeted estimand. This research project will produce R packages for methods deemed useful to trial statisticians where necessary, enabling them to implement these methods in practice and ensuring that the methods are actually used.

These impacts on the way trials are designed, documented, and analysed will lead to greater transparency and clarity regarding what the results of clinical trials mean for a range of stakeholders, including patients, clinicians, regulators, and payers. For patients and clinicians it should enable effect estimates to be provided that answer questions more relevant to them, such as 'what is the expected effect if I start and adhere to this treatment plan for the specified period?', rather than 'what is the expected effect of this treatment plan taking into account some patients will decide to withdraw from it?'. This will mean patients receive more accurate information about expected effects of treatments and this will help clinicians to make more informed decisions about patients' treatment and care. For regulators and payers the improvements should allow fairer comparison of effect estimates between different trials, through the anticipated clearer specification of trials' estimands and statistical analysis methods. This will enhance their ability to make sound evidence-based decisions about new treatments' efficacy, safety and cost effectiveness. It is anticipated that these impacts could be progressively achieved within five years of the research outputs being disseminated.

The peer reviewed journal publications resulting from this research will help trial statisticians use causal inference concepts and language to choose and more clearly specify the target estimand(s) in their trials. The ICH E9 addendum on estimands is expected to lead an increased emphasis on trial protocol documents clearly specifying the trial's estimand(s), and so the outputs from this research will influence the content of these sections of protocols.

The publications and tutorials resulting from the research will guide trial statisticians as to which statistical methods can be used to estimate their chosen estimands. The outputs should thus influence the choice of statistical analysis and how these are described in statistical analysis plans. The research should help make clear why a particular statistical method has been chosen given the trial's targeted estimand. This research project will produce R packages for methods deemed useful to trial statisticians where necessary, enabling them to implement these methods in practice and ensuring that the methods are actually used.

These impacts on the way trials are designed, documented, and analysed will lead to greater transparency and clarity regarding what the results of clinical trials mean for a range of stakeholders, including patients, clinicians, regulators, and payers. For patients and clinicians it should enable effect estimates to be provided that answer questions more relevant to them, such as 'what is the expected effect if I start and adhere to this treatment plan for the specified period?', rather than 'what is the expected effect of this treatment plan taking into account some patients will decide to withdraw from it?'. This will mean patients receive more accurate information about expected effects of treatments and this will help clinicians to make more informed decisions about patients' treatment and care. For regulators and payers the improvements should allow fairer comparison of effect estimates between different trials, through the anticipated clearer specification of trials' estimands and statistical analysis methods. This will enhance their ability to make sound evidence-based decisions about new treatments' efficacy, safety and cost effectiveness. It is anticipated that these impacts could be progressively achieved within five years of the research outputs being disseminated.

### Publications

Bartlett J
(2021)

*Reference-Based Multiple Imputation-What is the Right Variance and How to Estimate It*in Statistics in Biopharmaceutical Research
Carpenter J.R.
(2023)

*Multiple Imputation and its Application*in Multiple Imputation and its Application
Cornish R
(2023)

*Complete case logistic regression with a dichotomised continuous outcome led to biased estimates*in Journal of Clinical Epidemiology
Kumar B
(2024)

*Weighted Hazard Ratio Estimation for Delayed and Diminishing Treatment Effect*in Statistics in Biopharmaceutical Research
Morris TP
(2024)

*Comment on Oberman & Vink: Should we fix or simulate the complete data in simulation studies evaluating missing data methods?*in Biometrical journal. Biometrische Zeitschrift
Olarte Parra C
(2023)

*Hypothetical Estimands in Clinical Trials: A Unification of Causal Inference and Missing Data Methods.*in Statistics in biopharmaceutical research
Van Lancker K
(2022)

*Estimands and their Estimators for Clinical Trials Impacted by the COVID-19 Pandemic: A Report from the NISS Ingram Olkin Forum Series on Unplanned Clinical Trial Disruptions*in Statistics in Biopharmaceutical Research
Wolbers M
(2022)

*Standard and reference-based conditional mean imputation.*in Pharmaceutical statisticsDescription | AstraZeneca collaboration for MICA: Clinical trial estimands: from definition to estimation |

Organisation | AstraZeneca |

Country | United Kingdom |

Sector | Private |

PI Contribution | We (the PI, co-PI and research associate) have led and performed the work in this MRC grant, in which AstraZeneca is an industrial collaborator. |

Collaborator Contribution | To date we have had three half day meetings with David Wright, Head of Statistical Innovation at AstraZeneca, to obtain his input and insights into the work in this grant. He has commented on a publication resulting from this grant, and is a co-author on a paper which is under review at the journal Biometrics. |

Impact | A paper analysing a diabetes trial in which David Wright from AstraZeneca has been involved and will be a co-author (pre-print here https://arxiv.org/pdf/2308.13085.pdf) has been submitted to the journal Biometrics and is undergoing revision following reviews. |

Start Year | 2020 |

Title | Bias reduced imputation for missing binary covariates in the smcfcs package |

Description | In this work I extended my existing R package for multiple imputation (smcfcs) to include functionality for imputing missing binary covariates using bias-reduced logistic regression, as developed by Firth in 1993 (https://doi.org/10.1093/biomet/80.1.27). This extension is useful for handling so called perfect prediction problems, where one variable perfectly predicts a binary variable being imputed or modelled. In such cases (which are quite common), the software before this extension would fail. |

Type Of Technology | Software |

Year Produced | 2022 |

Open Source License? | Yes |

Impact | The smcfcs methodology and R package continue to be used extensively in applications - for example, for the three years 2020-2022, a Google Search for "R package" "smcfcs" reveals 46 citations. A prominent example is the use of the package for missing data sensitivity analyses in a trial examining effectiveness of face masks for preventing Covid-19 (https://doi.org/10.7326/M20-6817). |

URL | https://thestatsgeek.com/2022/05/24/perfect-prediction-handling-in-smcfcs-for-r/ |

Title | gFormulaMI - an R package for performing G-formula for causal inference via multiple imputation |

Description | This is an open source R package implementing a statistical method we have developed. This approach involves implementing G-formula, a method for performing casual inference in longitudinal datasets, by exploiting existing methods and software for handling missing data, namely, the method of multiple imputation. |

Type Of Technology | Software |

Year Produced | 2023 |

Open Source License? | Yes |

Impact | None so far, but it's only just been released. |

URL | https://jwb133.github.io/gFormulaMI/ |

Description | The Stats Geek blog |

Form Of Engagement Activity | Engagement focused website, blog or social media channel |

Part Of Official Scheme? | No |

Geographic Reach | International |

Primary Audience | Professional Practitioners |

Results and Impact | I wrote a post on my popular biostatistics blog The Stats Geek about the first paper to emerge from this grant (which is under review at a journal). The reach has been estimated on the unique page views from Google Analytics. The post gave a high level summary of our paper, and I believe helped to disseminate our work to target users, namely statisticians working in clinical trials. |

Year(s) Of Engagement Activity | 2021 |

URL | https://thestatsgeek.com/2021/07/12/hypothetical-estimands-a-unification-of-causal-inference-and-mis... |