CITCoM: Casual Inference for Testing of Computational Models

Lead Research Organisation: University of Sheffield

Department Name: Computer Science

Abstract

Computational models are being used increasingly to offer answers to important questions that affect us all. Scientists are increasingly resorting to computational models to simulate phenomena as diverse as the effects of drugs on a physiology, transmissions of diseases in a society, or the flow of blood through an artery. Within the public sector, computational models are fundamental to enabling the prediction of weather patterns, both in the short term and also to predict the impact of global warming in the longer term. They are also increasingly vital for supporting decisions on infrastructure spend; our project partners in the DAFNI project are developing computational modelling infrastructure to support the investment of £460bn over the course of the coming decade.

Given the high-stakes decisions that are usually involved, mistakes or "bugs" in a model can lead (and have led) to disastrous consequences. It is critical that these systems are rigorously tested to minimise this risk.

Computational models are however not amenable to traditional software testing and debugging techniques. They can include large numbers of parameters and configuration options. They can take a very long time (and require a lot of computational resources) to execute a single test run, which makes it infeasible to run large numbers of test executions. The data structures that they operate on can be particularly complex (e.g. 3D models of cities or coronary arteries), which means that these can be difficult to synthesise and inspect. Finally, if a test run is found to produce an incorrect result, these factors can make it very difficult to identify where the bug is in the source code of the model.

CITCoM is based on the observation that the challenge is in many ways rooted in data-analysis. In the presence of large numbers of input variables, there is the challenge of analysing the tested behaviour and ensuring that the observed behaviour is caused by the parameters that are the focus of the test (and not accidentally caused by other incidental parameters). There is the converse challenge of selecting which inputs need to be varied and which ones need to be controlled to demonstrate that a given combination of inputs causes a particular behaviour whilst keeping the number of test cases this requires to a minimum. If a fault occurs, there is the challenge of interrogating the data to locate the fault in the code.

Similar problems arise in a wide range of disciplines, and especially in the field of Epidemiology - where population data are scrutinised to determine the effects of drug treatments or medical interventions. Again, there are many variables at play (lifestyle, cultural background, genetic traits, habits). Collecting data can be expensive and time-consuming. Outcomes can be difficult to measure and complex to scrutinise. For such situations, the last decade has seen the rapid rise of a family of statistical analysis approaches called Causal Inference. This has enabled statisticians to design and reason about epidemiological trials and data in new and powerful ways to efficiently sample data, handle missing data-attributes, and use existing data to answer "what-if" questions, even if the data in question has not been collected yet.

CITCoM will use these powerful Causal Inference analysis capabilities to address the problems that arise when testing computational models. We will generate Causal Inference-driven automated test-generation techniques, test oracles, and debugging techniques. These will be trialled and honed on a set of large case-study models in collaboration with our partners on the DAFNI project at STFC, at DSTL, and within The University of Sheffield.

Ultimately, CITCoM will enable us to generate, collect, and analyse evidence from computational models to ensure that they do not contain faults, so that any decisions that they feed into are well-founded and trustworthy.

Planned Impact

The ability to rigorously test computational models can potentially have a significant positive impact upon the following areas:

Economy: The goal of putting "the UK at the forefront of the AI and data revolution" is one of the four grand challenges set out by the Government's Industrial Strategy. Computational models, which enable computers to simulate and predict phenomena, are fundamental to achieving this aim. The National Infrastructure Commission's report "Data for the Public Good" shows how modelling approaches (such as Digital Twins) have the potential to save society billions of pounds: Complex machinery can be constructed and simulated virtually, providing an opportunity to eliminate faults before the investment is made to physically manufacture and deploy the product. The value of such virtual models however ultimately depends on their validity and trustworthiness, for which the ability to systematically and rigorously test them is critical. CITCoM will provide the tools and techniques to enable this. Thanks to our collaboration with DSTL and the DAFNI consortium we will be ideally positioned to ensure this impact, by ensuring that the CITCoM techniques are available to a wide range of modelling projects within industry and the industrial sector.

Society: Computational models are increasingly crucial for designing complex systems to improve them with respect to factors that have a societal impact: the safety of aircraft control systems, the security of communication systems, the energy-efficiency of turbines or heat-efficiency of buildings. All of these factors represent direct positive consequences for society. CITCoM would provide techniques that can serve to explore the functionality of these systems, not just to establish their trustworthiness, but also to explore and interrogate their behaviours in a more general sense.

Knowledge: The techniques developed as part of this project will have a significant impact upon software testing. Many systems share the attributes of computational models - complex input types, large parameter sets, complex outputs, and long execution times. The testing techniques developed within CITCoM (or elements thereof) stand to be transferrable to other domains, such as cyber-physical systems.

People: The project will have a positive impact on the careers of the investigators, the RA, and the Ph.D. student. Walkinshaw, Hierons, White and the Ph.D. student (funded by The University of Sheffield) will gain valuable experience of testing industrial and established academic computational models, and will establish themselves as leading researchers in this area within the UK. Latimer will build upon his Causal Inference expertise, and will develop new links with the Computational Modelling community (both within the university of Sheffield and beyond), which will feed into new research opportunities in his work on health economics. Wagg will gain valuable experience of new testing practices, which can be directly applied to the Digital Twin models within the EPSRC DigiTwin project.

Funded Value:

£670,838

Funded Period:

Jan 21 - Feb 25

Funder:

EPSRC

Project Status:

Active

Project Category:

Research Grant

Project Reference:

EP/T030526/1

Principal Investigator:

Neil Walkinshaw

Research Subject:

Info. & commun. Technol. (100%)

Research Topic:

Software Engineering (100%)

Organisations

People	ORCID iD
Neil Walkinshaw (Principal Investigator)
David Wagg (Co-Investigator)
Nicholas Latimer (Co-Investigator)
Rob Hierons (Co-Investigator)	http://orcid.org/0000-0002-4771-1446
David White (Researcher)

Publications

Author Name

Title Publication Date Published

|< < 1 2 > >|

10 25 50

Anness A (2021) VP34.05: The influence of maternal hemodynamics on neonatal birthweight in pregnancies complicated by gestational diabetes compared to low-risk controls in Ultrasound in Obstetrics & Gynecology

Anness AR (2022) Maternal hemodynamics and neonatal birth weight in pregnancies complicated by gestational diabetes: new insights from novel causal inference analysis modeling. in Ultrasound in obstetrics & gynecology : the official journal of the International Society of Ultrasound in Obstetrics and Gynecology

Anness AR (2024) Do maternal haemodynamics have a causal influence on treatment for gestational diabetes? in Journal of obstetrics and gynaecology : the journal of the Institute of Obstetrics and Gynaecology

Ataiefard F (2022) Deep State Inference: Toward Behavioral Model Inference of Black-Box Software Systems in IEEE Transactions on Software Engineering

Clark A (2021) Test case generation for agent-based models: A systematic literature review in Information and Software Technology

Clark A (2023) Testing Causality in Scientific Modelling Software in ACM Transactions on Software Engineering and Methodology

Clark A (2023) Metamorphic Testing with Causal Graphs

Clark A (2022) Testing Causality in Scientific Modelling Software

Foster M (2022) Testing Software and Systems - 33rd IFIP WG 6.1 International Conference, ICTSS 2021, London, UK, November 10-12, 2021, Proceedings

Foster M (2023) Formal Methods and Software Engineering - 24th International Conference on Formal Engineering Methods, ICFEM 2023, Brisbane, QLD, Australia, November 21-24, 2023, Proceedings

Key Findings
Research Databases and Models
Collaboration
Software and Technical Products
Engagement Activities


Description	There are currently successful "metamorphic testing" approaches that can test systems by in terms of changes to their inputs, by observing anticipated changes in their outputs. One weakness of these techniques is that the tester needs to be able to systematically control the inputs and observe the outputs. In the CITCOM grant we have shown how, with the help of Causal Inference, it becomes possible to test the same properties without the need for these controlled sets of executions - just from observing executions passively. We have demonstrated this on significant, complex software systems (computational COVID models and autonomous driving simulators).
Exploitation Route	It enables the testing of systems that are traditionally very hard to test (i.e. systems with lots of inputs, or systems that have lots of input parameters).
Sectors	Aerospace Defence and Marine Digital/Communication/Information Technologies (including Software) Electronics Manufacturing including Industrial Biotechology


Title	Causal Test Adequacy
Description	This repository contains the code necessary to reproduce and process the results in our associated paper, "Causal Test Adequacy". For further details, please see `README.md`.
Type Of Material	Database/Collection of data
Year Produced	2023
Provided To Others?	Yes
Impact	The paper has not yet appeared formally.
URL	https://orda.shef.ac.uk/articles/dataset/Causal_Test_Adequacy/24422104


Title	Dataset for Paper "Digital twin based testing for cyber-physical systems: A systematic literature review"
Description	An excel spreadsheet containing exported metadata collected during a systematic literature review on digital twin based testing for cyber-physical systems.
Type Of Material	Database/Collection of data
Year Produced	2022
Provided To Others?	Yes
URL	https://figshare.shef.ac.uk/articles/dataset/Dataset_for_Paper_Digital_twin_based_testing_for_cyber-...


Title	Dataset for Paper "Digital twin based testing for cyber-physical systems: A systematic literature review"
Description	An excel spreadsheet containing exported metadata collected during a systematic literature review on digital twin based testing for cyber-physical systems.
Type Of Material	Database/Collection of data
Year Produced	2022
Provided To Others?	Yes
Impact	The paper (based upon the dataset) has seen a significant amount of interest from the community, and is a key publication for the testing of cyberphysical systems and digital twins.
URL	https://figshare.shef.ac.uk/articles/dataset/Dataset_for_Paper_Digital_twin_based_testing_for_cyber-...


Title	Modelling Uncertainty in State Based Systems
Description	A diverse set of state machines, which were used for the basis of two experiments, to investigate the value of Subjective Logic State Machines.This repository includes: * A curated set of state machines, encoded into dot files (sources in submitted paper).* A jar version of the Mint tool, which included the test prioritisation and state machine inference code.* CSV files containing the accuracy results and prioritisation results data.* R scripts used to analyse the results data and to generate the various figures for the paper.
Type Of Material	Database/Collection of data
Year Produced	2021
Provided To Others?	Yes
Impact	This was submitted to support a paper currently under review, so there are no impacts to report beyond supporting the immediate publication of this paper for now.
URL	https://figshare.shef.ac.uk/articles/dataset/Modelling_Uncertainty_in_State_Based_Systems/14287040/1


Description	Application of Causal Inference to Investigation of Birth-weight Data
Organisation	University Hospitals of Leicester NHS Trust
Country	United Kingdom
Sector	Academic/University
PI Contribution	This was a collaboration of mutual benefit. We could train up members of our team to get them acquainted with Causal Inference, by applying this to a data-set collected at Leicester University Hospitals. This was for the sake of investigating diabetes in maternity, and specifically the relationship between cardiac blood flow and the birth weight of children. This research has since been published.
Collaborator Contribution	The partners contributed data, and framed the research questions.
Impact	This was a multidisciplinary collaboration. We provided our expertise in causal modelling, acquired through the CITCOM grant. Anness, Osman, Webb, Robinson, Khalil and Mousa provided clinical expertise in obstetrics and gynecology. Anness, A. R., et al. "Maternal hemodynamics and neonatal birth weight in pregnancies complicated by gestational diabetes: new insights from novel causal inference analysis modeling." Ultrasound in Obstetrics & Gynecology (2022). Anness, A. R., Foster, M., Osman, M. W., Webb, D., Robinson, T., Khalil, A., ... & Mousa, H. A. (2024). Do maternal haemodynamics have a causal influence on treatment for gestational diabetes?. Journal of Obstetrics and Gynaecology, 44(1), 2307883.
Start Year	2021


Description	Application of Causal Testing within BT
Organisation	BT Group
Department	BT Research
Country	United Kingdom
Sector	Private
PI Contribution	We are offering the use of the software and associated methodologies developed through CITCOM.
Collaborator Contribution	BT are offering access to software test logs.
Impact	This is ongoing - we are in the process of carrying out a case study, which will lead to a publication later on in the year.
Start Year	2022


Description	Causality in Software Engineering Systematic Review
Organisation	King's College London
Country	United Kingdom
Sector	Academic/University
PI Contribution	We are working jointly on a literature survey of causality in software engineering.
Collaborator Contribution	We are providing search techniques and terminology on causality in software engineering.
Impact	No inputs yet - authorship of a paper in progress.
Start Year	2023


Description	Using Digital Twins to Test Artificial Pancreas Systems
Organisation	Sheffield Teaching Hospital
Country	United Kingdom
Sector	Hospitals
PI Contribution	Richard Somers is a Ph.D. student funded by the department, to work on the CITCOM grant. His thesis is to investigate the use of causal inference to test medical cyber physical systems. He is developing an approach to support people with type-1 diabetes who want to use artificial pancreas systems.
Collaborator Contribution	Our colleagues at Sheffield Teaching Hospitals meet us regularly to discuss our progress and to offer feedback from a clinical perspective. The objective is to ultimately publish the work collaboratively.
Impact	A proof of concept software implementation, still to be published.
Start Year	2022


Title	CITCOM Causal Testing Framework
Description	This is a software application framework, which can be used as a basis for causal testing of computational models. As a proof of concept we have applied this to a variety of published models, including the CovaSIm model.
Type Of Technology	Webtool/Application
Year Produced	2021
Open Source License?	Yes
Impact	The software has become the basis for most of our ongoing CITCOM research. It has detected bugs or flaws in several software systems, including COVID models, and self-driving car simulators.
URL	https://github.com/CITCOM-project/CausalTestingFramework


Description	Sheffield Causality and Testing Workshop
Form Of Engagement Activity	A formal working group, expert panel or dialogue
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Study participants or study members
Results and Impact	We hosted a CITCOM causality and testing workshop, to bring together experts on linking the two areas. This involved 30 academics, mostly from the UK, but two of whom joined us from Germany, and a further joined us remotely from Singapore. This has led to an ongoing collaboration with Kings College and the University of Sao Paolo on surveying activity in causality and testing.
Year(s) Of Engagement Activity	2023

Abstract

Planned Impact

Organisations

People

ORCID iD

Publications