CITCoM: Casual Inference for Testing of Computational Models
Lead Research Organisation:
University of Sheffield
Department Name: Computer Science
Abstract
Computational models are being used increasingly to offer answers to important questions that affect us all. Scientists are increasingly resorting to computational models to simulate phenomena as diverse as the effects of drugs on a physiology, transmissions of diseases in a society, or the flow of blood through an artery. Within the public sector, computational models are fundamental to enabling the prediction of weather patterns, both in the short term and also to predict the impact of global warming in the longer term. They are also increasingly vital for supporting decisions on infrastructure spend; our project partners in the DAFNI project are developing computational modelling infrastructure to support the investment of £460bn over the course of the coming decade.
Given the high-stakes decisions that are usually involved, mistakes or "bugs" in a model can lead (and have led) to disastrous consequences. It is critical that these systems are rigorously tested to minimise this risk.
Computational models are however not amenable to traditional software testing and debugging techniques. They can include large numbers of parameters and configuration options. They can take a very long time (and require a lot of computational resources) to execute a single test run, which makes it infeasible to run large numbers of test executions. The data structures that they operate on can be particularly complex (e.g. 3D models of cities or coronary arteries), which means that these can be difficult to synthesise and inspect. Finally, if a test run is found to produce an incorrect result, these factors can make it very difficult to identify where the bug is in the source code of the model.
CITCoM is based on the observation that the challenge is in many ways rooted in data-analysis. In the presence of large numbers of input variables, there is the challenge of analysing the tested behaviour and ensuring that the observed behaviour is caused by the parameters that are the focus of the test (and not accidentally caused by other incidental parameters). There is the converse challenge of selecting which inputs need to be varied and which ones need to be controlled to demonstrate that a given combination of inputs causes a particular behaviour whilst keeping the number of test cases this requires to a minimum. If a fault occurs, there is the challenge of interrogating the data to locate the fault in the code.
Similar problems arise in a wide range of disciplines, and especially in the field of Epidemiology - where population data are scrutinised to determine the effects of drug treatments or medical interventions. Again, there are many variables at play (lifestyle, cultural background, genetic traits, habits). Collecting data can be expensive and time-consuming. Outcomes can be difficult to measure and complex to scrutinise. For such situations, the last decade has seen the rapid rise of a family of statistical analysis approaches called Causal Inference. This has enabled statisticians to design and reason about epidemiological trials and data in new and powerful ways to efficiently sample data, handle missing data-attributes, and use existing data to answer "what-if" questions, even if the data in question has not been collected yet.
CITCoM will use these powerful Causal Inference analysis capabilities to address the problems that arise when testing computational models. We will generate Causal Inference-driven automated test-generation techniques, test oracles, and debugging techniques. These will be trialled and honed on a set of large case-study models in collaboration with our partners on the DAFNI project at STFC, at DSTL, and within The University of Sheffield.
Ultimately, CITCoM will enable us to generate, collect, and analyse evidence from computational models to ensure that they do not contain faults, so that any decisions that they feed into are well-founded and trustworthy.
Given the high-stakes decisions that are usually involved, mistakes or "bugs" in a model can lead (and have led) to disastrous consequences. It is critical that these systems are rigorously tested to minimise this risk.
Computational models are however not amenable to traditional software testing and debugging techniques. They can include large numbers of parameters and configuration options. They can take a very long time (and require a lot of computational resources) to execute a single test run, which makes it infeasible to run large numbers of test executions. The data structures that they operate on can be particularly complex (e.g. 3D models of cities or coronary arteries), which means that these can be difficult to synthesise and inspect. Finally, if a test run is found to produce an incorrect result, these factors can make it very difficult to identify where the bug is in the source code of the model.
CITCoM is based on the observation that the challenge is in many ways rooted in data-analysis. In the presence of large numbers of input variables, there is the challenge of analysing the tested behaviour and ensuring that the observed behaviour is caused by the parameters that are the focus of the test (and not accidentally caused by other incidental parameters). There is the converse challenge of selecting which inputs need to be varied and which ones need to be controlled to demonstrate that a given combination of inputs causes a particular behaviour whilst keeping the number of test cases this requires to a minimum. If a fault occurs, there is the challenge of interrogating the data to locate the fault in the code.
Similar problems arise in a wide range of disciplines, and especially in the field of Epidemiology - where population data are scrutinised to determine the effects of drug treatments or medical interventions. Again, there are many variables at play (lifestyle, cultural background, genetic traits, habits). Collecting data can be expensive and time-consuming. Outcomes can be difficult to measure and complex to scrutinise. For such situations, the last decade has seen the rapid rise of a family of statistical analysis approaches called Causal Inference. This has enabled statisticians to design and reason about epidemiological trials and data in new and powerful ways to efficiently sample data, handle missing data-attributes, and use existing data to answer "what-if" questions, even if the data in question has not been collected yet.
CITCoM will use these powerful Causal Inference analysis capabilities to address the problems that arise when testing computational models. We will generate Causal Inference-driven automated test-generation techniques, test oracles, and debugging techniques. These will be trialled and honed on a set of large case-study models in collaboration with our partners on the DAFNI project at STFC, at DSTL, and within The University of Sheffield.
Ultimately, CITCoM will enable us to generate, collect, and analyse evidence from computational models to ensure that they do not contain faults, so that any decisions that they feed into are well-founded and trustworthy.
Planned Impact
The ability to rigorously test computational models can potentially have a significant positive impact upon the following areas:
Economy: The goal of putting "the UK at the forefront of the AI and data revolution" is one of the four grand challenges set out by the Government's Industrial Strategy. Computational models, which enable computers to simulate and predict phenomena, are fundamental to achieving this aim. The National Infrastructure Commission's report "Data for the Public Good" shows how modelling approaches (such as Digital Twins) have the potential to save society billions of pounds: Complex machinery can be constructed and simulated virtually, providing an opportunity to eliminate faults before the investment is made to physically manufacture and deploy the product. The value of such virtual models however ultimately depends on their validity and trustworthiness, for which the ability to systematically and rigorously test them is critical. CITCoM will provide the tools and techniques to enable this. Thanks to our collaboration with DSTL and the DAFNI consortium we will be ideally positioned to ensure this impact, by ensuring that the CITCoM techniques are available to a wide range of modelling projects within industry and the industrial sector.
Society: Computational models are increasingly crucial for designing complex systems to improve them with respect to factors that have a societal impact: the safety of aircraft control systems, the security of communication systems, the energy-efficiency of turbines or heat-efficiency of buildings. All of these factors represent direct positive consequences for society. CITCoM would provide techniques that can serve to explore the functionality of these systems, not just to establish their trustworthiness, but also to explore and interrogate their behaviours in a more general sense.
Knowledge: The techniques developed as part of this project will have a significant impact upon software testing. Many systems share the attributes of computational models - complex input types, large parameter sets, complex outputs, and long execution times. The testing techniques developed within CITCoM (or elements thereof) stand to be transferrable to other domains, such as cyber-physical systems.
People: The project will have a positive impact on the careers of the investigators, the RA, and the Ph.D. student. Walkinshaw, Hierons, White and the Ph.D. student (funded by The University of Sheffield) will gain valuable experience of testing industrial and established academic computational models, and will establish themselves as leading researchers in this area within the UK. Latimer will build upon his Causal Inference expertise, and will develop new links with the Computational Modelling community (both within the university of Sheffield and beyond), which will feed into new research opportunities in his work on health economics. Wagg will gain valuable experience of new testing practices, which can be directly applied to the Digital Twin models within the EPSRC DigiTwin project.
Economy: The goal of putting "the UK at the forefront of the AI and data revolution" is one of the four grand challenges set out by the Government's Industrial Strategy. Computational models, which enable computers to simulate and predict phenomena, are fundamental to achieving this aim. The National Infrastructure Commission's report "Data for the Public Good" shows how modelling approaches (such as Digital Twins) have the potential to save society billions of pounds: Complex machinery can be constructed and simulated virtually, providing an opportunity to eliminate faults before the investment is made to physically manufacture and deploy the product. The value of such virtual models however ultimately depends on their validity and trustworthiness, for which the ability to systematically and rigorously test them is critical. CITCoM will provide the tools and techniques to enable this. Thanks to our collaboration with DSTL and the DAFNI consortium we will be ideally positioned to ensure this impact, by ensuring that the CITCoM techniques are available to a wide range of modelling projects within industry and the industrial sector.
Society: Computational models are increasingly crucial for designing complex systems to improve them with respect to factors that have a societal impact: the safety of aircraft control systems, the security of communication systems, the energy-efficiency of turbines or heat-efficiency of buildings. All of these factors represent direct positive consequences for society. CITCoM would provide techniques that can serve to explore the functionality of these systems, not just to establish their trustworthiness, but also to explore and interrogate their behaviours in a more general sense.
Knowledge: The techniques developed as part of this project will have a significant impact upon software testing. Many systems share the attributes of computational models - complex input types, large parameter sets, complex outputs, and long execution times. The testing techniques developed within CITCoM (or elements thereof) stand to be transferrable to other domains, such as cyber-physical systems.
People: The project will have a positive impact on the careers of the investigators, the RA, and the Ph.D. student. Walkinshaw, Hierons, White and the Ph.D. student (funded by The University of Sheffield) will gain valuable experience of testing industrial and established academic computational models, and will establish themselves as leading researchers in this area within the UK. Latimer will build upon his Causal Inference expertise, and will develop new links with the Computational Modelling community (both within the university of Sheffield and beyond), which will feed into new research opportunities in his work on health economics. Wagg will gain valuable experience of new testing practices, which can be directly applied to the Digital Twin models within the EPSRC DigiTwin project.
Organisations
- University of Sheffield (Lead Research Organisation)
- Sheffield Teaching Hospital (Collaboration)
- BT Group (Collaboration)
- UNIVERSITY HOSPITALS OF LEICESTER NHS TRUST (Collaboration)
- KING'S COLLEGE LONDON (Collaboration)
- Defence Science and Technology Laboratory (Project Partner)
- Case Western Reserve University (Project Partner)
- Chalmers University of Technology (Project Partner)
- Science and Technology Facilities Council (Project Partner)
Publications
Anness A
(2021)
VP34.05: The influence of maternal hemodynamics on neonatal birthweight in pregnancies complicated by gestational diabetes compared to low-risk controls
in Ultrasound in Obstetrics & Gynecology
Anness AR
(2022)
Maternal hemodynamics and neonatal birth weight in pregnancies complicated by gestational diabetes: new insights from novel causal inference analysis modeling.
in Ultrasound in obstetrics & gynecology : the official journal of the International Society of Ultrasound in Obstetrics and Gynecology
Anness AR
(2024)
Do maternal haemodynamics have a causal influence on treatment for gestational diabetes?
in Journal of obstetrics and gynaecology : the journal of the Institute of Obstetrics and Gynaecology
Ataiefard F
(2022)
Deep State Inference: Toward Behavioral Model Inference of Black-Box Software Systems
in IEEE Transactions on Software Engineering
Clark A
(2021)
Test case generation for agent-based models: A systematic literature review
in Information and Software Technology
Clark A
(2023)
Testing Causality in Scientific Modelling Software
in ACM Transactions on Software Engineering and Methodology
Clark A
(2023)
Metamorphic Testing with Causal Graphs
Clark A
(2022)
Testing Causality in Scientific Modelling Software
Description | There are currently successful "metamorphic testing" approaches that can test systems by in terms of changes to their inputs, by observing anticipated changes in their outputs. One weakness of these techniques is that the tester needs to be able to systematically control the inputs and observe the outputs. In the CITCOM grant we have shown how, with the help of Causal Inference, it becomes possible to test the same properties without the need for these controlled sets of executions - just from observing executions passively. We have demonstrated this on significant, complex software systems (computational COVID models and autonomous driving simulators). |
Exploitation Route | It enables the testing of systems that are traditionally very hard to test (i.e. systems with lots of inputs, or systems that have lots of input parameters). |
Sectors | Aerospace Defence and Marine Digital/Communication/Information Technologies (including Software) Electronics Manufacturing including Industrial Biotechology |
Title | Causal Test Adequacy |
Description | This repository contains the code necessary to reproduce and process the results in our associated paper, "Causal Test Adequacy". For further details, please see
README.md . |
Type Of Material | Database/Collection of data |
Year Produced | 2023 |
Provided To Others? | Yes |
Impact | The paper has not yet appeared formally. |
URL | https://orda.shef.ac.uk/articles/dataset/Causal_Test_Adequacy/24422104 |
Title | Dataset for Paper "Digital twin based testing for cyber-physical systems: A systematic literature review" |
Description | An excel spreadsheet containing exported metadata collected during a systematic literature review on digital twin based testing for cyber-physical systems. |
Type Of Material | Database/Collection of data |
Year Produced | 2022 |
Provided To Others? | Yes |
URL | https://figshare.shef.ac.uk/articles/dataset/Dataset_for_Paper_Digital_twin_based_testing_for_cyber-... |
Title | Dataset for Paper "Digital twin based testing for cyber-physical systems: A systematic literature review" |
Description | An excel spreadsheet containing exported metadata collected during a systematic literature review on digital twin based testing for cyber-physical systems. |
Type Of Material | Database/Collection of data |
Year Produced | 2022 |
Provided To Others? | Yes |
Impact | The paper (based upon the dataset) has seen a significant amount of interest from the community, and is a key publication for the testing of cyberphysical systems and digital twins. |
URL | https://figshare.shef.ac.uk/articles/dataset/Dataset_for_Paper_Digital_twin_based_testing_for_cyber-... |
Title | Modelling Uncertainty in State Based Systems |
Description | A diverse set of state machines, which were used for the basis of two experiments, to investigate the value of Subjective Logic State Machines.This repository includes: * A curated set of state machines, encoded into dot files (sources in submitted paper).* A jar version of the Mint tool, which included the test prioritisation and state machine inference code.* CSV files containing the accuracy results and prioritisation results data.* R scripts used to analyse the results data and to generate the various figures for the paper. |
Type Of Material | Database/Collection of data |
Year Produced | 2021 |
Provided To Others? | Yes |
Impact | This was submitted to support a paper currently under review, so there are no impacts to report beyond supporting the immediate publication of this paper for now. |
URL | https://figshare.shef.ac.uk/articles/dataset/Modelling_Uncertainty_in_State_Based_Systems/14287040/1 |
Description | Application of Causal Inference to Investigation of Birth-weight Data |
Organisation | University Hospitals of Leicester NHS Trust |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | This was a collaboration of mutual benefit. We could train up members of our team to get them acquainted with Causal Inference, by applying this to a data-set collected at Leicester University Hospitals. This was for the sake of investigating diabetes in maternity, and specifically the relationship between cardiac blood flow and the birth weight of children. This research has since been published. |
Collaborator Contribution | The partners contributed data, and framed the research questions. |
Impact | This was a multidisciplinary collaboration. We provided our expertise in causal modelling, acquired through the CITCOM grant. Anness, Osman, Webb, Robinson, Khalil and Mousa provided clinical expertise in obstetrics and gynecology. Anness, A. R., et al. "Maternal hemodynamics and neonatal birth weight in pregnancies complicated by gestational diabetes: new insights from novel causal inference analysis modeling." Ultrasound in Obstetrics & Gynecology (2022). Anness, A. R., Foster, M., Osman, M. W., Webb, D., Robinson, T., Khalil, A., ... & Mousa, H. A. (2024). Do maternal haemodynamics have a causal influence on treatment for gestational diabetes?. Journal of Obstetrics and Gynaecology, 44(1), 2307883. |
Start Year | 2021 |
Description | Application of Causal Testing within BT |
Organisation | BT Group |
Department | BT Research |
Country | United Kingdom |
Sector | Private |
PI Contribution | We are offering the use of the software and associated methodologies developed through CITCOM. |
Collaborator Contribution | BT are offering access to software test logs. |
Impact | This is ongoing - we are in the process of carrying out a case study, which will lead to a publication later on in the year. |
Start Year | 2022 |
Description | Causality in Software Engineering Systematic Review |
Organisation | King's College London |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | We are working jointly on a literature survey of causality in software engineering. |
Collaborator Contribution | We are providing search techniques and terminology on causality in software engineering. |
Impact | No inputs yet - authorship of a paper in progress. |
Start Year | 2023 |
Description | Using Digital Twins to Test Artificial Pancreas Systems |
Organisation | Sheffield Teaching Hospital |
Country | United Kingdom |
Sector | Hospitals |
PI Contribution | Richard Somers is a Ph.D. student funded by the department, to work on the CITCOM grant. His thesis is to investigate the use of causal inference to test medical cyber physical systems. He is developing an approach to support people with type-1 diabetes who want to use artificial pancreas systems. |
Collaborator Contribution | Our colleagues at Sheffield Teaching Hospitals meet us regularly to discuss our progress and to offer feedback from a clinical perspective. The objective is to ultimately publish the work collaboratively. |
Impact | A proof of concept software implementation, still to be published. |
Start Year | 2022 |
Title | CITCOM Causal Testing Framework |
Description | This is a software application framework, which can be used as a basis for causal testing of computational models. As a proof of concept we have applied this to a variety of published models, including the CovaSIm model. |
Type Of Technology | Webtool/Application |
Year Produced | 2021 |
Open Source License? | Yes |
Impact | The software has become the basis for most of our ongoing CITCOM research. It has detected bugs or flaws in several software systems, including COVID models, and self-driving car simulators. |
URL | https://github.com/CITCOM-project/CausalTestingFramework |
Description | Sheffield Causality and Testing Workshop |
Form Of Engagement Activity | A formal working group, expert panel or dialogue |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Study participants or study members |
Results and Impact | We hosted a CITCOM causality and testing workshop, to bring together experts on linking the two areas. This involved 30 academics, mostly from the UK, but two of whom joined us from Germany, and a further joined us remotely from Singapore. This has led to an ongoing collaboration with Kings College and the University of Sao Paolo on surveying activity in causality and testing. |
Year(s) Of Engagement Activity | 2023 |