📣 Help Shape the Future of UKRI's Gateway to Research (GtR)

We're improving UKRI's Gateway to Research and are seeking your input! If you would be interested in being interviewed about the improvements we're making and to have your say about how we can make GtR more user-friendly, impactful, and effective for the Research and Innovation community, please email gateway@ukri.org.

ROBEST: Ensuring robustness of evidence in public health research for increased policy impact: widened use of advanced causal inference techniques

Lead Research Organisation: London School of Hygiene and Tropical Medicine
Department Name: Epidemiology and Population Health

Abstract

Coherent and effective public health policies rest on reliable evidence, such that researchers are able to identify, demonstrate, and raise awareness for a need for change, as well as measure the causal effect of proposed changes. Such evidence can be built upon rich electronic health records now available in many varied research fields including public health, health economics, epidemiology and clinical science. The potential of these data is enormous as it offers a valuable source of information to obtain real-world evidence to inform public health policies. Nonetheless, reliable evidence can only be obtained through widespread use of robust statistical methodology among applied researchers with interests on evaluative research.
The large number of potential confounders and their possible complex relationships with the outcome makes the use of standard regression methods challenging or even impossible in some instance. Furthermore, the observational nature of such data makes any causal interpretation of the findings with conventional analytic approaches hazardous. These caveats call for specific causal inference methodology, aimed at approaching observational data with a randomised trial mindset.
Alongside the growing availability of data, there has been a rapid development of statistical tools designed to further the use of observational data to answer causal questions. One of the recently developed algorithms, blending machine learning techniques with causal inference methodology, is the targeted maximum likelihood estimation (TMLE). This cutting-edge approach combines double-robust estimation and good statistical properties, enabling causal inference.
Nonetheless, there is some discrepancy between the speed of methodological development and the adoption of these innovative methods among applied researchers. We identified three reasons for this misalignment: a gap in the understanding of the new methods, a lack of ready-to-use software, and the scarcity of published publications showcasing the superiority of TMLE. We aim to address these shortcomings in this proposal.
We will provide applied researchers with tutorials designed to demystify complex mathematical and statistical concepts used in the latest developments of targeted machine learning estimation. Furthermore, we propose to implement the latest TMLE developments in Stata, a statistical software favoured by most applied researchers in public health, health economics, epidemiology and clinical science. We will extend the eltmle (https://github.com/migariane/eltmle) Stata command we developed, together with extensive help file, by adding new functionalities to allow robust statistical inference. Furthermore, we plan the publication of a simple yet detailed article in the Stata Journal, online tutorials and empirical applications illustrating the use of eltmle. Lastly, we will provide demonstrations of the good properties of TMLE in simulated scenarios.
We will apply eltmle command to estimate how working environment causally affects cancer incidence and mortality, and to evaluate the causal effect of the type of colon cancer surgery (laparoscopy vs. open) on 30-day mortality.
Our dissemination strategy will target both applied researchers and stakeholders. It includes several channels, from classical publications and conference presentations, to dissemination through online open-source tutorials and technical support using open-source tools such as GitHub, as well as early engagement with stakeholders to develop the applied studies. Furthermore, we will run a two-day workshop hosted at the London School of Hygiene and Tropical Medicine, aiming to foster a network of eltmle users.

Technical Summary

Often, questions that motivate studies in the health, social and behavioral sciences are causal but tend to be answered using classical statistical methods. However, causal inference methods are needed when causality cannot be guaranteed by design (i.e., observational studies) or when randomisation fails and does not provide the required balance in trials. Over the years, rapid ongoing advances in the field of causal inference for observational data have resulted in several algorithms to estimate the causal effects of a treatment on an outcome. Recently, data-adaptive estimation using machine learning techniques has been incorporated in the development of causal inference estimators. One of these algorithms is the targeted maximum likelihood estimation (TMLE). TMLE is a semiparametric double-robust, efficient substitution estimator allowing for data-adaptive estimation while obtaining valid statistical inference. In addition to being double-robust, TMLE allows the inclusion of machine learning algorithms that minimise the risk of model misspecification, a problem that persists for competing estimators. Nonetheless, TMLE rests on relatively complex statistical and mathematical concepts that need to be demystified for wider adoption. Furthermore, some questions remain for statistical inference in non-parametric settings (i.e., confidence intervals nominal coverage). This is an area of ongoing work where cross-validation is used to overcome TMLE issues in non-parametric settings (i.e., Donsker class condition). The Donsker class condition refers to the smoothness needed in finite samples to assume asymptotic linearity and implement statistical inference based on the influence function. We plan to extend a previous Stata implementation of TMLE we developed to implement the most recent theoretical advances: i) to produce robust statistical inference in finite samples, and ii) to include other functionalities making the package readily accessible for applied researchers.

Publications

10 25 50
 
Description MS UCL PhD supervision 
Organisation University College London
Country United Kingdom 
Sector Academic/University 
PI Contribution MS has been invited to contribute to the supervision of PhD Student Aasiyah Rashan on the topic of "Determining the transferability of treatment effects between international critical care populations". As such MS was given an Honorary Research Fellow position at UCL.
Collaborator Contribution MS is providing support to Aasiyah for specific aims and objectives of their PhD project. MS participates in regular calls with the supervisory team, and contributes to developing Aasiyah's skills and expertise in their research topic.
Impact No outputs so far.
Start Year 2023
 
Description ACIC - poster Matthew Smith 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Third sector organisations
Results and Impact Matthew Smith presented a poster at the American Causal Inference Conference, he attended the conference and met with members of the team at Berkeley university who develop TMLE
Year(s) Of Engagement Activity 2024
URL https://sci-info.org/wp-content/uploads/2024/05/event_202312_agenda_pdf_aoggo.pdf
 
Description Miguel's ROBEST presentation (Granada) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Professional Practitioners
Results and Impact Miguel-Angel Luque Fernandez was invited to present the work conducted as part of the research funded through ROBEST to the Institute of Mathematics at the University of Granada, Spain.
Year(s) Of Engagement Activity 2023
URL https://wpd.ugr.es/~imag/events/event/ensemble-learning-targeted-maximum-likelihood-estimation-for-s...
 
Description Pacific Causal Inference Conference MS presentation 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Ms was invited to present on ongoing developments of our work on causal inference for the relative survival setting at the Pacific Causal Inference Conference in a section dedicated to survival outcomes.
Year(s) Of Engagement Activity 2024
URL https://www.spco.cc/pcic/
 
Description REDICO advisory board 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact Camille Maringe is a member of the advisory board of the REDICO programme (Uni. Of Luxembourg).
Year(s) Of Engagement Activity 2023,2024,2025
URL https://researchportal.lih.lu/en/projects/reducing-disparities-in-cancer-outcomes
 
Description Yorkshire Cancer Research - research advisory board 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Other audiences
Results and Impact Camille Maringe joined as a member of the Yorkshire Cancer Research Advisory Panel. Panel members assist with the assessment of funding applications by reviewing a few applications remotely (typically 2 or 3 each year) or
attending the annual Research Advisory Meeting in Harrogate to consider shortlisted proposals. In addition, panel members may be asked for advice about future direction and planning of new activities, on an ad hoc basis.
Year(s) Of Engagement Activity 2025
URL https://www.yorkshirecancerresearch.org.uk/