📣 Help Shape the Future of UKRI's Gateway to Research (GtR)

We're improving UKRI's Gateway to Research and are seeking your input! If you would be interested in being interviewed about the improvements we're making and to have your say about how we can make GtR more user-friendly, impactful, and effective for the Research and Innovation community, please email gateway@ukri.org.

The Causal Continuum - Transforming Modelling and Computation in Causal Inference

Lead Research Organisation: UNIVERSITY COLLEGE LONDON
Department Name: Statistical Science

Abstract

A central aspect of science and engineering is to be able to answer "what if" questions. What will happen if this gene suffers a mutation? What are the public health consequences of having this social benefit cut? What can we do to mitigate disparities among social groups? To which extent are lockdowns useful to mitigate a pandemic? Which ramifications will take place if failures occur at these points of a major logistical operation such as food supply chains?

These are cause-effect questions. Answering them is hard because it involves change. Historical data may fail to capture the implications of change, placing causal questions out of the comfort zone by which data is used to inform decisions. It is one thing to predict the life expectancy of a smoker, as done by public health officials or insurance companies. It is much harder to understand what will happen if we convince someone to stop smoking, as historical data may have a substantive number of cases where people stopped smoking shortly before dying of respiratory disease, due to discomfort. A statistical or machine learning method oblivious to these causal explanations may actually say that stopping smoking is bad for one's health.

Ideally, we would like to perform randomised controlled trials where the choice of action to be taken is decided by the flip of coin, so that confounding factors between cause and effect are overridden. This removal of confounding is necessary to show convincingly, for instance, that a covid-19 vaccination works due to biological processes as opposed to sociological confounding factors among those who choose to be vaccinated and their health outcomes. However, in many cases such trials can be very expensive (understanding genetic networks involves a large experimental space) or unethical (we cannot force someone to smoke or not), and even when they take place, a controlled trial may not fully control the factor of interest (we can randomly assign a drug or placebo to a patient, but we may not have the means to make the patient comply with the treatment if they stay at home).

Data scientists have not ignored these problems, and we can thank the hard work of epidemiologists, for instance, for presenting a convincing case establishing the harmful link between smoking and lung cancer. But without randomised trials, the answer to a "what if" question requires assumptions or otherwise it is unknowable. This means that causal inference progresses slowly and is prone to mistakes. Part of the reason is that, traditionally, methods for causal inference largely rely on pre-defined families of assumptions chosen by statisticians designing methods that will provide unambiguous answers. Applied scientists then choose to adopt a particular method according to what manages to be a good enough approximation to their understanding of the world (one simple case: assume we have no common causes that are not measured in the data!). Although there are tools for sensitivity analysis (what if assumptions are violated in some particular ways?), they don't address the main issue directly: a domain-expert should be given the chance of specifying upfront assumptions according to the way they see appropriate, and not be artificially told a single, convenient answer, but what indeed can be disentangled from the observational data given the information provided. One of the reasons this workflow is not popular is the need for computationally-intensive algorithms to deduce the consequences of such assumptions.

This project has the ambition of changing the common practice for causal inference, increasing transparency and the speed by which we understand the limits of our knowledge and where to look for in order to progress. It will rely on cutting-edge algorithms for providing a flexible sandbox for domain experts to express their knowledge on a very flexible way, while offering also the backend support for the sophisticated computational methods needed.
 
Title Dual Risk Minimization 
Description A method for learning and performing automated classification across data sources of varying relationships, while trading-off robustness to major unanticipated variabilities against making the most of the available data. 
Type Of Material Computer model/algorithm 
Year Produced 2024 
Provided To Others? Yes  
Impact Nothing to report yet. 
URL https://github.com/vaynexie/DRM
 
Title LeakyIV: bounds from imperfect instrumental variable models 
Description This tool provides bounds on causal effects obtained via imperfect instruments which may confound a treatment and outcome of interest, exploiting information on the degree of violation of classical instrumental variables assumptions. 
Type Of Material Computer model/algorithm 
Year Produced 2024 
Provided To Others? Yes  
Impact Nothing to report yet 
URL https://github.com/dswatson/leakyIV
 
Title Models for intervention generalisation 
Description A methodology for composing multiple interventional datasets to predict the impact of unseen combinations, with applications to cell biology. 
Type Of Material Computer model/algorithm 
Year Produced 2023 
Provided To Others? Yes  
Impact Nothing yet to report 
URL https://github.com/rbas-ucl/intgen
 
Title Stochastic causal programming 
Description A collection of algorithms for computing bounds of causal effects in models where hidden common causes render such methods unidentifiable. 
Type Of Material Computer model/algorithm 
Year Produced 2023 
Provided To Others? Yes  
Impact Nothing to report yet 
URL https://github.com/kirtanp/SCP_bounds
 
Title Structured Learning of Compositional Sequential Interventions 
Description Code for a predictive model of effects of combinations of interventions over time in sequential data. 
Type Of Material Computer model/algorithm 
Year Produced 2024 
Provided To Others? Yes  
Impact Many organisations track progress of units over time, and expose them to interventions intended to guide their behaviour. We have described the ideas and outcomes of this projects to collaborators in Spotify who were acknowledged in the companion paper. 
URL https://github.com/jialin-yu/CSI-VAE
 
Description Causal predictive models in combinatorial spaces of exposures, with applications to recommender systems 
Organisation Spotify
Country Sweden 
Sector Private 
PI Contribution We contributed with code, ideas and the final paper resulting from our discussions. The PI attended regular meetings with Spotify representatives.
Collaborator Contribution In-kind contributions via staff time in regular meetings ranging from late 2023 to 2024. Ideas and feedback were provided.
Impact Paper "Structured Learning of Compositional Sequential Interventions", presented at the 2024 conference Neural Information Processing Systems. This paper can be obtained from https://proceedings.neurips.cc/paper_files/paper/2024/hash/d10c7e24c96db4b222688efd11b02940-Abstract-Conference.html
Start Year 2023
 
Description DeepMind 2024 
Organisation Alphabet
Department Deepmind
Country United Kingdom 
Sector Private 
PI Contribution This was a joint effort between researchers at UCL, Oxford, Cambridge/Max Plank, and DeepMind. We had on-line meetings spread over a period of two years, some overlapping with the fellowship period. The senior researchers (UCL and DeepMind) contributed with research ideas and evaluation methods, the junior researchers (Oxford and Cambridge) contributed with further ideas, pieces of writing, coding and reporting of benchmarks.
Collaborator Contribution Research ideas contributed by the DeepMind team includes: formalisation of problem, choice of algorithmic framework, choice of benchmarks and evaluation metrics.
Impact Conference paper, "Pragmatic Fairness: Developing Policies with Outcome Disparity Control", to be presented at the 2024 Conference on Causal Learning and Reasoning.
Start Year 2023
 
Description Keynote talk at Thematic Quarter on Causality , France 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact This was a keynote talk at the Thematic Quarter on Causality, in Paris-Orsay, France.
Year(s) Of Engagement Activity 2023
URL https://quarter-on-causality.github.io
 
Description Panel member, Causal Representation Learning workshop 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact A panel discussion among experts on the state and future directions of a major area within causal models in AI. Audience participation and questions helped to frame possible research programmes.
Year(s) Of Engagement Activity 2024
URL https://crl-community.github.io/neurips24
 
Description Seminar talk, University of Toronto 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Postgraduate students
Results and Impact A one-day visit to University of Toronto as their first external speaker on the series "Causal Inference: Bringing together data science and causal inference for better policy recommendations". Had several individual face-to-face meeting with faculty from Statistics, Computer Science, Economics and Business from UofT. Made new connections for CHAI from researchers on AI in healthcare.
Year(s) Of Engagement Activity 2024
URL https://datasciences.utoronto.ca/causal_inference/
 
Description Spotlight presentation of on-going work to EPSRC 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Supporters
Results and Impact This was given as part of the visit to UCL by members of the EPSRC Mathematical Sciences division.
Year(s) Of Engagement Activity 2023
 
Description Talk at Dunnhumby 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Industry/Business
Results and Impact (Online) talk for research and development professionals at Dunnhumby.
Year(s) Of Engagement Activity 2023
 
Description Talk at Technical University of Munich 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Invited talk at TUM for the Miniworkshop on Graphical Models and Causality
Year(s) Of Engagement Activity 2023
URL https://collab.dvb.bayern/display/TUMmathstat/Miniworkshop+on+Graphical+Models+and+Causality
 
Description Talk at University of Manchester 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Postgraduate students
Results and Impact This invited talk was at the Department of Mathematics at University of Manchester, titled "Stochastic Causal Programming".
Year(s) Of Engagement Activity 2023
 
Description Talk at University of York 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Postgraduate students
Results and Impact Invited research talk at University of York, title "Stochastic Causal Programming".
Year(s) Of Engagement Activity 2023
 
Description Talk at the Online Causal Inference Seminar 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Talk at the Online Causal Inference Seminar
Year(s) Of Engagement Activity 2023
URL https://sites.google.com/view/ocis/
 
Description Talk at the Statistical Laboratory, Cambridge 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Postgraduate students
Results and Impact A presentation in the seminar series organised by the Statistical Laboratory, University of Cambridge. The title was "Stochastic Causal Programming", and the audience included a mixture of students, professors and other researchers from the Centre for Mathematical Sciences.
Year(s) Of Engagement Activity 2023