The Causal Continuum - Transforming Modelling and Computation in Causal Inference

Lead Research Organisation: UNIVERSITY COLLEGE LONDON

Department Name: Statistical Science

Abstract

A central aspect of science and engineering is to be able to answer "what if" questions. What will happen if this gene suffers a mutation? What are the public health consequences of having this social benefit cut? What can we do to mitigate disparities among social groups? To which extent are lockdowns useful to mitigate a pandemic? Which ramifications will take place if failures occur at these points of a major logistical operation such as food supply chains?

These are cause-effect questions. Answering them is hard because it involves change. Historical data may fail to capture the implications of change, placing causal questions out of the comfort zone by which data is used to inform decisions. It is one thing to predict the life expectancy of a smoker, as done by public health officials or insurance companies. It is much harder to understand what will happen if we convince someone to stop smoking, as historical data may have a substantive number of cases where people stopped smoking shortly before dying of respiratory disease, due to discomfort. A statistical or machine learning method oblivious to these causal explanations may actually say that stopping smoking is bad for one's health.

Ideally, we would like to perform randomised controlled trials where the choice of action to be taken is decided by the flip of coin, so that confounding factors between cause and effect are overridden. This removal of confounding is necessary to show convincingly, for instance, that a covid-19 vaccination works due to biological processes as opposed to sociological confounding factors among those who choose to be vaccinated and their health outcomes. However, in many cases such trials can be very expensive (understanding genetic networks involves a large experimental space) or unethical (we cannot force someone to smoke or not), and even when they take place, a controlled trial may not fully control the factor of interest (we can randomly assign a drug or placebo to a patient, but we may not have the means to make the patient comply with the treatment if they stay at home).

Data scientists have not ignored these problems, and we can thank the hard work of epidemiologists, for instance, for presenting a convincing case establishing the harmful link between smoking and lung cancer. But without randomised trials, the answer to a "what if" question requires assumptions or otherwise it is unknowable. This means that causal inference progresses slowly and is prone to mistakes. Part of the reason is that, traditionally, methods for causal inference largely rely on pre-defined families of assumptions chosen by statisticians designing methods that will provide unambiguous answers. Applied scientists then choose to adopt a particular method according to what manages to be a good enough approximation to their understanding of the world (one simple case: assume we have no common causes that are not measured in the data!). Although there are tools for sensitivity analysis (what if assumptions are violated in some particular ways?), they don't address the main issue directly: a domain-expert should be given the chance of specifying upfront assumptions according to the way they see appropriate, and not be artificially told a single, convenient answer, but what indeed can be disentangled from the observational data given the information provided. One of the reasons this workflow is not popular is the need for computationally-intensive algorithms to deduce the consequences of such assumptions.

This project has the ambition of changing the common practice for causal inference, increasing transparency and the speed by which we understand the limits of our knowledge and where to look for in order to progress. It will rely on cutting-edge algorithms for providing a flexible sandbox for domain experts to express their knowledge on a very flexible way, while offering also the backend support for the sophisticated computational methods needed.

Funded Value:

£1,343,618

Funded Period:

Sep 22 - Sep 27

Funder:

EPSRC

Project Status:

Active

Project Category:

Fellowship

Project Reference:

EP/W024330/1

Principal Investigator:

Ricardo Silva

Research Subject:

Info. & commun. Technol. (60%)

Mathematical sciences (40%)

Research Topic:

Artificial Intelligence (60%)

Statistics & Appl. Probability (40%)

Organisations

People	ORCID iD
Ricardo Silva (Principal Investigator / Fellow)

Publications

Author Name Title Publication Date Published

10 25 50

Bravo-Hermsdoff G (2023) Intervention Generalization: A View from Factor Graph Models

K. Padh (2023) Stochastic Causal Programming for Bounding Treatment Effects

Yu J (2024) Structured Learning of Compositional Sequential Interventions

Silva R (2024) Counterfactual Fairness Is Not Demographic Parity, and Other Observations

Watson D (2024) Bounding causal effects with leaky instruments

Silva R (2024) Seconder of the vote of thanks to Evans and Didelez and contribution to the Discussion of 'Parameterizing and simulating from causal models' in Journal of the Royal Statistical Society Series B: Statistical Methodology

Li K (2024) Dual Risk Minimization: Towards Next-Level Robustness in Fine-tuning Zero-Shot Models

Gultchin L (2024) Pragmatic Fairness: Developing Policies with Outcome Disparity Control

Lee J (2025) BudgetIV: Optimal Partial Identification of Causal Effects with Mostly Invalid Instruments

Kornai D (2025) AGM-TE: Approximate Generative Model Estimator of Transfer Entropy for Causal Discovery and Causal Modelling

Research Databases and Models
Collaboration
Engagement Activities


Title	Dual Risk Minimization
Description	A method for learning and performing automated classification across data sources of varying relationships, while trading-off robustness to major unanticipated variabilities against making the most of the available data.
Type Of Material	Computer model/algorithm
Year Produced	2024
Provided To Others?	Yes
Impact	Nothing to report yet.
URL	https://github.com/vaynexie/DRM


Title	LeakyIV: bounds from imperfect instrumental variable models
Description	This tool provides bounds on causal effects obtained via imperfect instruments which may confound a treatment and outcome of interest, exploiting information on the degree of violation of classical instrumental variables assumptions.
Type Of Material	Computer model/algorithm
Year Produced	2024
Provided To Others?	Yes
Impact	Nothing to report yet
URL	https://github.com/dswatson/leakyIV


Title	Models for intervention generalisation
Description	A methodology for composing multiple interventional datasets to predict the impact of unseen combinations, with applications to cell biology.
Type Of Material	Computer model/algorithm
Year Produced	2023
Provided To Others?	Yes
Impact	Nothing yet to report
URL	https://github.com/rbas-ucl/intgen


Title	Stochastic causal programming
Description	A collection of algorithms for computing bounds of causal effects in models where hidden common causes render such methods unidentifiable.
Type Of Material	Computer model/algorithm
Year Produced	2023
Provided To Others?	Yes
Impact	Nothing to report yet
URL	https://github.com/kirtanp/SCP_bounds


Title	Structured Learning of Compositional Sequential Interventions
Description	Code for a predictive model of effects of combinations of interventions over time in sequential data.
Type Of Material	Computer model/algorithm
Year Produced	2024
Provided To Others?	Yes
Impact	Many organisations track progress of units over time, and expose them to interventions intended to guide their behaviour. We have described the ideas and outcomes of this projects to collaborators in Spotify who were acknowledged in the companion paper.
URL	https://github.com/jialin-yu/CSI-VAE


Description	Causal predictive models in combinatorial spaces of exposures, with applications to recommender systems
Organisation	Spotify
Country	Sweden
Sector	Private
PI Contribution	We contributed with code, ideas and the final paper resulting from our discussions. The PI attended regular meetings with Spotify representatives.
Collaborator Contribution	In-kind contributions via staff time in regular meetings ranging from late 2023 to 2024. Ideas and feedback were provided.
Impact	Paper "Structured Learning of Compositional Sequential Interventions", presented at the 2024 conference Neural Information Processing Systems. This paper can be obtained from https://proceedings.neurips.cc/paper_files/paper/2024/hash/d10c7e24c96db4b222688efd11b02940-Abstract-Conference.html
Start Year	2023


Description	DeepMind 2024
Organisation	Alphabet
Department	Deepmind
Country	United Kingdom
Sector	Private
PI Contribution	This was a joint effort between researchers at UCL, Oxford, Cambridge/Max Plank, and DeepMind. We had on-line meetings spread over a period of two years, some overlapping with the fellowship period. The senior researchers (UCL and DeepMind) contributed with research ideas and evaluation methods, the junior researchers (Oxford and Cambridge) contributed with further ideas, pieces of writing, coding and reporting of benchmarks.
Collaborator Contribution	Research ideas contributed by the DeepMind team includes: formalisation of problem, choice of algorithmic framework, choice of benchmarks and evaluation metrics.
Impact	Conference paper, "Pragmatic Fairness: Developing Policies with Outcome Disparity Control", to be presented at the 2024 Conference on Causal Learning and Reasoning.
Start Year	2023


Description	Keynote talk at Thematic Quarter on Causality , France
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	This was a keynote talk at the Thematic Quarter on Causality, in Paris-Orsay, France.
Year(s) Of Engagement Activity	2023
URL	https://quarter-on-causality.github.io


Description	Panel member, Causal Representation Learning workshop
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	A panel discussion among experts on the state and future directions of a major area within causal models in AI. Audience participation and questions helped to frame possible research programmes.
Year(s) Of Engagement Activity	2024
URL	https://crl-community.github.io/neurips24


Description	Seminar talk, University of Toronto
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Local
Primary Audience	Postgraduate students
Results and Impact	A one-day visit to University of Toronto as their first external speaker on the series "Causal Inference: Bringing together data science and causal inference for better policy recommendations". Had several individual face-to-face meeting with faculty from Statistics, Computer Science, Economics and Business from UofT. Made new connections for CHAI from researchers on AI in healthcare.
Year(s) Of Engagement Activity	2024
URL	https://datasciences.utoronto.ca/causal_inference/


Description	Spotlight presentation of on-going work to EPSRC
Form Of Engagement Activity	A formal working group, expert panel or dialogue
Part Of Official Scheme?	No
Geographic Reach	Local
Primary Audience	Supporters
Results and Impact	This was given as part of the visit to UCL by members of the EPSRC Mathematical Sciences division.
Year(s) Of Engagement Activity	2023


Description	Talk at Dunnhumby
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Industry/Business
Results and Impact	(Online) talk for research and development professionals at Dunnhumby.
Year(s) Of Engagement Activity	2023


Description	Talk at Technical University of Munich
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	Invited talk at TUM for the Miniworkshop on Graphical Models and Causality
Year(s) Of Engagement Activity	2023
URL	https://collab.dvb.bayern/display/TUMmathstat/Miniworkshop+on+Graphical+Models+and+Causality


Description	Talk at University of Manchester
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Regional
Primary Audience	Postgraduate students
Results and Impact	This invited talk was at the Department of Mathematics at University of Manchester, titled "Stochastic Causal Programming".
Year(s) Of Engagement Activity	2023


Description	Talk at University of York
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Regional
Primary Audience	Postgraduate students
Results and Impact	Invited research talk at University of York, title "Stochastic Causal Programming".
Year(s) Of Engagement Activity	2023


Description	Talk at the Online Causal Inference Seminar
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	Talk at the Online Causal Inference Seminar
Year(s) Of Engagement Activity	2023
URL	https://sites.google.com/view/ocis/


Description	Talk at the Statistical Laboratory, Cambridge
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Regional
Primary Audience	Postgraduate students
Results and Impact	A presentation in the seminar series organised by the Statistical Laboratory, University of Cambridge. The title was "Stochastic Causal Programming", and the audience included a mixture of students, professors and other researchers from the Centre for Mathematical Sciences.
Year(s) Of Engagement Activity	2023

Abstract

Organisations

People

ORCID iD

Publications