The Causal Continuum - Transforming Modelling and Computation in Causal Inference
Lead Research Organisation:
UNIVERSITY COLLEGE LONDON
Department Name: Statistical Science
Abstract
A central aspect of science and engineering is to be able to answer "what if" questions. What will happen if this gene suffers a mutation? What are the public health consequences of having this social benefit cut? What can we do to mitigate disparities among social groups? To which extent are lockdowns useful to mitigate a pandemic? Which ramifications will take place if failures occur at these points of a major logistical operation such as food supply chains?
These are cause-effect questions. Answering them is hard because it involves change. Historical data may fail to capture the implications of change, placing causal questions out of the comfort zone by which data is used to inform decisions. It is one thing to predict the life expectancy of a smoker, as done by public health officials or insurance companies. It is much harder to understand what will happen if we convince someone to stop smoking, as historical data may have a substantive number of cases where people stopped smoking shortly before dying of respiratory disease, due to discomfort. A statistical or machine learning method oblivious to these causal explanations may actually say that stopping smoking is bad for one's health.
Ideally, we would like to perform randomised controlled trials where the choice of action to be taken is decided by the flip of coin, so that confounding factors between cause and effect are overridden. This removal of confounding is necessary to show convincingly, for instance, that a covid-19 vaccination works due to biological processes as opposed to sociological confounding factors among those who choose to be vaccinated and their health outcomes. However, in many cases such trials can be very expensive (understanding genetic networks involves a large experimental space) or unethical (we cannot force someone to smoke or not), and even when they take place, a controlled trial may not fully control the factor of interest (we can randomly assign a drug or placebo to a patient, but we may not have the means to make the patient comply with the treatment if they stay at home).
Data scientists have not ignored these problems, and we can thank the hard work of epidemiologists, for instance, for presenting a convincing case establishing the harmful link between smoking and lung cancer. But without randomised trials, the answer to a "what if" question requires assumptions or otherwise it is unknowable. This means that causal inference progresses slowly and is prone to mistakes. Part of the reason is that, traditionally, methods for causal inference largely rely on pre-defined families of assumptions chosen by statisticians designing methods that will provide unambiguous answers. Applied scientists then choose to adopt a particular method according to what manages to be a good enough approximation to their understanding of the world (one simple case: assume we have no common causes that are not measured in the data!). Although there are tools for sensitivity analysis (what if assumptions are violated in some particular ways?), they don't address the main issue directly: a domain-expert should be given the chance of specifying upfront assumptions according to the way they see appropriate, and not be artificially told a single, convenient answer, but what indeed can be disentangled from the observational data given the information provided. One of the reasons this workflow is not popular is the need for computationally-intensive algorithms to deduce the consequences of such assumptions.
This project has the ambition of changing the common practice for causal inference, increasing transparency and the speed by which we understand the limits of our knowledge and where to look for in order to progress. It will rely on cutting-edge algorithms for providing a flexible sandbox for domain experts to express their knowledge on a very flexible way, while offering also the backend support for the sophisticated computational methods needed.
These are cause-effect questions. Answering them is hard because it involves change. Historical data may fail to capture the implications of change, placing causal questions out of the comfort zone by which data is used to inform decisions. It is one thing to predict the life expectancy of a smoker, as done by public health officials or insurance companies. It is much harder to understand what will happen if we convince someone to stop smoking, as historical data may have a substantive number of cases where people stopped smoking shortly before dying of respiratory disease, due to discomfort. A statistical or machine learning method oblivious to these causal explanations may actually say that stopping smoking is bad for one's health.
Ideally, we would like to perform randomised controlled trials where the choice of action to be taken is decided by the flip of coin, so that confounding factors between cause and effect are overridden. This removal of confounding is necessary to show convincingly, for instance, that a covid-19 vaccination works due to biological processes as opposed to sociological confounding factors among those who choose to be vaccinated and their health outcomes. However, in many cases such trials can be very expensive (understanding genetic networks involves a large experimental space) or unethical (we cannot force someone to smoke or not), and even when they take place, a controlled trial may not fully control the factor of interest (we can randomly assign a drug or placebo to a patient, but we may not have the means to make the patient comply with the treatment if they stay at home).
Data scientists have not ignored these problems, and we can thank the hard work of epidemiologists, for instance, for presenting a convincing case establishing the harmful link between smoking and lung cancer. But without randomised trials, the answer to a "what if" question requires assumptions or otherwise it is unknowable. This means that causal inference progresses slowly and is prone to mistakes. Part of the reason is that, traditionally, methods for causal inference largely rely on pre-defined families of assumptions chosen by statisticians designing methods that will provide unambiguous answers. Applied scientists then choose to adopt a particular method according to what manages to be a good enough approximation to their understanding of the world (one simple case: assume we have no common causes that are not measured in the data!). Although there are tools for sensitivity analysis (what if assumptions are violated in some particular ways?), they don't address the main issue directly: a domain-expert should be given the chance of specifying upfront assumptions according to the way they see appropriate, and not be artificially told a single, convenient answer, but what indeed can be disentangled from the observational data given the information provided. One of the reasons this workflow is not popular is the need for computationally-intensive algorithms to deduce the consequences of such assumptions.
This project has the ambition of changing the common practice for causal inference, increasing transparency and the speed by which we understand the limits of our knowledge and where to look for in order to progress. It will rely on cutting-edge algorithms for providing a flexible sandbox for domain experts to express their knowledge on a very flexible way, while offering also the backend support for the sophisticated computational methods needed.
People |
ORCID iD |
| Ricardo Silva (Principal Investigator / Fellow) |
Publications
Bravo-Hermsdoff G
(2023)
Intervention Generalization: A View from Factor Graph Models
K. Padh
(2023)
Stochastic Causal Programming for Bounding Treatment Effects
Watson D
(2024)
Bounding causal effects with leaky instruments
Silva R
(2024)
Seconder of the vote of thanks to Evans and Didelez and contribution to the Discussion of 'Parameterizing and simulating from causal models'
in Journal of the Royal Statistical Society Series B: Statistical Methodology
Gultchin L
(2024)
Pragmatic Fairness: Developing Policies with Outcome Disparity Control
| Title | Dual Risk Minimization |
| Description | A method for learning and performing automated classification across data sources of varying relationships, while trading-off robustness to major unanticipated variabilities against making the most of the available data. |
| Type Of Material | Computer model/algorithm |
| Year Produced | 2024 |
| Provided To Others? | Yes |
| Impact | Nothing to report yet. |
| URL | https://github.com/vaynexie/DRM |
| Title | LeakyIV: bounds from imperfect instrumental variable models |
| Description | This tool provides bounds on causal effects obtained via imperfect instruments which may confound a treatment and outcome of interest, exploiting information on the degree of violation of classical instrumental variables assumptions. |
| Type Of Material | Computer model/algorithm |
| Year Produced | 2024 |
| Provided To Others? | Yes |
| Impact | Nothing to report yet |
| URL | https://github.com/dswatson/leakyIV |
| Title | Models for intervention generalisation |
| Description | A methodology for composing multiple interventional datasets to predict the impact of unseen combinations, with applications to cell biology. |
| Type Of Material | Computer model/algorithm |
| Year Produced | 2023 |
| Provided To Others? | Yes |
| Impact | Nothing yet to report |
| URL | https://github.com/rbas-ucl/intgen |
| Title | Stochastic causal programming |
| Description | A collection of algorithms for computing bounds of causal effects in models where hidden common causes render such methods unidentifiable. |
| Type Of Material | Computer model/algorithm |
| Year Produced | 2023 |
| Provided To Others? | Yes |
| Impact | Nothing to report yet |
| URL | https://github.com/kirtanp/SCP_bounds |
| Title | Structured Learning of Compositional Sequential Interventions |
| Description | Code for a predictive model of effects of combinations of interventions over time in sequential data. |
| Type Of Material | Computer model/algorithm |
| Year Produced | 2024 |
| Provided To Others? | Yes |
| Impact | Many organisations track progress of units over time, and expose them to interventions intended to guide their behaviour. We have described the ideas and outcomes of this projects to collaborators in Spotify who were acknowledged in the companion paper. |
| URL | https://github.com/jialin-yu/CSI-VAE |
| Description | Causal predictive models in combinatorial spaces of exposures, with applications to recommender systems |
| Organisation | Spotify |
| Country | Sweden |
| Sector | Private |
| PI Contribution | We contributed with code, ideas and the final paper resulting from our discussions. The PI attended regular meetings with Spotify representatives. |
| Collaborator Contribution | In-kind contributions via staff time in regular meetings ranging from late 2023 to 2024. Ideas and feedback were provided. |
| Impact | Paper "Structured Learning of Compositional Sequential Interventions", presented at the 2024 conference Neural Information Processing Systems. This paper can be obtained from https://proceedings.neurips.cc/paper_files/paper/2024/hash/d10c7e24c96db4b222688efd11b02940-Abstract-Conference.html |
| Start Year | 2023 |
| Description | DeepMind 2024 |
| Organisation | Alphabet |
| Department | Deepmind |
| Country | United Kingdom |
| Sector | Private |
| PI Contribution | This was a joint effort between researchers at UCL, Oxford, Cambridge/Max Plank, and DeepMind. We had on-line meetings spread over a period of two years, some overlapping with the fellowship period. The senior researchers (UCL and DeepMind) contributed with research ideas and evaluation methods, the junior researchers (Oxford and Cambridge) contributed with further ideas, pieces of writing, coding and reporting of benchmarks. |
| Collaborator Contribution | Research ideas contributed by the DeepMind team includes: formalisation of problem, choice of algorithmic framework, choice of benchmarks and evaluation metrics. |
| Impact | Conference paper, "Pragmatic Fairness: Developing Policies with Outcome Disparity Control", to be presented at the 2024 Conference on Causal Learning and Reasoning. |
| Start Year | 2023 |
| Description | Keynote talk at Thematic Quarter on Causality , France |
| Form Of Engagement Activity | Participation in an activity, workshop or similar |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Postgraduate students |
| Results and Impact | This was a keynote talk at the Thematic Quarter on Causality, in Paris-Orsay, France. |
| Year(s) Of Engagement Activity | 2023 |
| URL | https://quarter-on-causality.github.io |
| Description | Panel member, Causal Representation Learning workshop |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Professional Practitioners |
| Results and Impact | A panel discussion among experts on the state and future directions of a major area within causal models in AI. Audience participation and questions helped to frame possible research programmes. |
| Year(s) Of Engagement Activity | 2024 |
| URL | https://crl-community.github.io/neurips24 |
| Description | Seminar talk, University of Toronto |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | Local |
| Primary Audience | Postgraduate students |
| Results and Impact | A one-day visit to University of Toronto as their first external speaker on the series "Causal Inference: Bringing together data science and causal inference for better policy recommendations". Had several individual face-to-face meeting with faculty from Statistics, Computer Science, Economics and Business from UofT. Made new connections for CHAI from researchers on AI in healthcare. |
| Year(s) Of Engagement Activity | 2024 |
| URL | https://datasciences.utoronto.ca/causal_inference/ |
| Description | Spotlight presentation of on-going work to EPSRC |
| Form Of Engagement Activity | A formal working group, expert panel or dialogue |
| Part Of Official Scheme? | No |
| Geographic Reach | Local |
| Primary Audience | Supporters |
| Results and Impact | This was given as part of the visit to UCL by members of the EPSRC Mathematical Sciences division. |
| Year(s) Of Engagement Activity | 2023 |
| Description | Talk at Dunnhumby |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | National |
| Primary Audience | Industry/Business |
| Results and Impact | (Online) talk for research and development professionals at Dunnhumby. |
| Year(s) Of Engagement Activity | 2023 |
| Description | Talk at Technical University of Munich |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Postgraduate students |
| Results and Impact | Invited talk at TUM for the Miniworkshop on Graphical Models and Causality |
| Year(s) Of Engagement Activity | 2023 |
| URL | https://collab.dvb.bayern/display/TUMmathstat/Miniworkshop+on+Graphical+Models+and+Causality |
| Description | Talk at University of Manchester |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | Regional |
| Primary Audience | Postgraduate students |
| Results and Impact | This invited talk was at the Department of Mathematics at University of Manchester, titled "Stochastic Causal Programming". |
| Year(s) Of Engagement Activity | 2023 |
| Description | Talk at University of York |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | Regional |
| Primary Audience | Postgraduate students |
| Results and Impact | Invited research talk at University of York, title "Stochastic Causal Programming". |
| Year(s) Of Engagement Activity | 2023 |
| Description | Talk at the Online Causal Inference Seminar |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Postgraduate students |
| Results and Impact | Talk at the Online Causal Inference Seminar |
| Year(s) Of Engagement Activity | 2023 |
| URL | https://sites.google.com/view/ocis/ |
| Description | Talk at the Statistical Laboratory, Cambridge |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | Regional |
| Primary Audience | Postgraduate students |
| Results and Impact | A presentation in the seminar series organised by the Statistical Laboratory, University of Cambridge. The title was "Stochastic Causal Programming", and the audience included a mixture of students, professors and other researchers from the Centre for Mathematical Sciences. |
| Year(s) Of Engagement Activity | 2023 |