Multi-version Execution Techniques for Increasing the Reliability and Security of Evolving Software
Lead Research Organisation:
Imperial College London
Department Name: Computing
Abstract
One of the distinguishing characteristics of software systems is that they evolve: new patches are committed to software repositories and new versions are released to users on a continuous basis. Unfortunately, many of these changes bring unexpected failures that break the stability of the system or affect its security, and users face the uncomfortable choice between using an old stable version which misses recent features and bug fixes, and upgrading to a new version which improves the software in certain ways, only to introduce other bugs and security vulnerabilities.
In this fellowship, I plan to investigate novel techniques for improving the reliability and security of evolving software, based on the idea of combining the execution of multiple software versions in such a way as to increase the reliability and security of the "multi-version" application and eliminate a large number of common bugs introduced by software updates.
This is an ambitious proposal, which presents several challenges spanning the areas of software engineering, computer systems, and security: understanding how software evolves, and particularly the effects of incorrect updates on software evolution; addressing the technical challenges of multi-version execution such as creating an application-level sandboxing environment and devising lightweight record and replay techniques; designing error recovery strategies that effectively combine different software versions; and determining the applicability of multi-version execution to the different types of applications and code changes encountered in practice.
In this fellowship, I plan to investigate novel techniques for improving the reliability and security of evolving software, based on the idea of combining the execution of multiple software versions in such a way as to increase the reliability and security of the "multi-version" application and eliminate a large number of common bugs introduced by software updates.
This is an ambitious proposal, which presents several challenges spanning the areas of software engineering, computer systems, and security: understanding how software evolves, and particularly the effects of incorrect updates on software evolution; addressing the technical challenges of multi-version execution such as creating an application-level sandboxing environment and devising lightweight record and replay techniques; designing error recovery strategies that effectively combine different software versions; and determining the applicability of multi-version execution to the different types of applications and code changes encountered in practice.
Planned Impact
The correct operation of today's software systems relies on the reliability and security of the software update process. As a result, this research could have a direct impact in practice, improving the productivity of the large number of computer users and system administrators that need to incorporate software updates on a regular basis. Furthermore, our approach could improve the security of software by preventing denial-of-service attacks exploiting security vulnerabilities in applications with high-availability requirements such as network servers, as well as by encouraging users to incorporate more quickly the latest security fixes.
While the immediate societal benefits will be centred around improving the quality and security of software updates, many of the techniques developed in this work could lead to significant benefits in other areas related to software quality. For example, our lightweight replay-and-record techniques could be used to send easily reproducible error reports to developers, while our research on parallel multi-version execution could be used to deploy high-interaction honeypots offering a fine-grained understanding of the anatomy of an attack.
Our goal is to disseminate the results of this research from an early stage onwards through a variety of mechanisms: (1) by open-sourcing most of our initial prototypes to be used directly by software developers on their code; (2) by applying it to open-source and commercial code to demonstrate the effectiveness of our approach; and (3) by communicating the results of our research to the non-academic user community.
While the immediate societal benefits will be centred around improving the quality and security of software updates, many of the techniques developed in this work could lead to significant benefits in other areas related to software quality. For example, our lightweight replay-and-record techniques could be used to send easily reproducible error reports to developers, while our research on parallel multi-version execution could be used to deploy high-interaction honeypots offering a fine-grained understanding of the anatomy of an attack.
Our goal is to disseminate the results of this research from an early stage onwards through a variety of mechanisms: (1) by open-sourcing most of our initial prototypes to be used directly by software developers on their code; (2) by applying it to open-source and commercial code to demonstrate the effectiveness of our approach; and (3) by communicating the results of our research to the non-academic user community.
People |
ORCID iD |
Cristian Cadar (Principal Investigator / Fellow) |
Publications
Hosek P
(2015)
VARAN the Unbelievable An Efficient N-version Execution Framework
in ACM SIGPLAN Notices
Kuchta T
(2018)
Shadow Symbolic Execution for Testing Software Patches
in ACM Transactions on Software Engineering and Methodology
Kuchta T
(2018)
On the correctness of electronic documents: studying, finding, and localizing inconsistency bugs in PDF readers and files
in Empirical Software Engineering
Arras P
(2022)
SaBRe: load-time selective binary rewriting
in International Journal on Software Tools for Technology Transfer
Kapus T
(2019)
A segmented memory model for symbolic execution
Cadar C
(2015)
Targeted program transformations for symbolic execution
Cadar, C
(2016)
Analysing the Program Analyser
Pina L
(2018)
FreeDA
Cadar C
(2016)
Analysing the program analyser
Palikareva, H
(2016)
Shadow of a Doubt: Testing for Divergences Between Software Versions
Kuchta T
(2014)
Docovery
Perry D
(2017)
Accelerating array constraints in symbolic execution
Trabish D
(2018)
Chopped symbolic execution
Dustmann O
(2018)
PARTI: a multi-interval theory solver for symbolic execution
Description | This grant has enabled the design of novel techniques for increasing the reliability and security of evolving software. We have created a novel multi-version execution system that allows multiple versions or variants of a software application to run side-by-side with a low performance overhead. We have built several new techniques on top of this system, including running several versions of the same application in order to survive bugs introduced by software updates; running different variants of the same application created by different compilers, in order to detect compiler-bug-based backdoors; running sanitized versions of an application in parallel with the native version in order to detect software bugs. We have also designed a novel dynamic binary rewriter, and implemented several plugins on top of it, such as a lightweight fault injector and fast system call tracer. Our Mvedsua project has combined multi-version execution with dynamic software updating to create more reliable dynamic software updates. In addition, the grant has supported research into using dynamic symbolic execution to find bugs in software applications, an empirical study for understanding how large software projects evolve over time, a study on the correctness of electronic documents: studying, finding, and localizing inconsistency bugs in PDF readers and files; the design of taint tracking and symbolic execution techniques for recovering broken documents, and a new memory model for effectively handling symbolic pointers. |
Exploitation Route | The multi-version execution techniques we designed could be used to improve the software update process, by enabling transparent fail-over in the case of update errors. They could also be used to find bugs in deployed applications, and detect compiler-bug-based backdoors. The multi-version execution framework that we designed as part of this project is now a reference for work in this area, and parts of its architecture were incorporated in the frameworks of other groups. Our symbolic execution techniques have helped increase the scalability of the technique, and can be used to find important bugs and security vulnerabilities, including in evolving applications via patch testing. Our symbolic execution framework KLEE, on top of which most of these techniques are implemented, continues to be one of the most popular symbolic execution platforms, with a large number of users from both academia and industry. |
Sectors | Digital/Communication/Information Technologies (including Software) |
URL | http://srg.doc.ic.ac.uk/projects/ |
Description | Several of our techniques improved the testing of popular open-source software systems. Our symbolic execution techniques have been used by a number of companies, such as Baidu, Fujitsu and Samsung. Our binary rewriting system is available as open source and was presented at the popular FOSDEM 2020 event on open-source software. |
First Year Of Impact | 2015 |
Sector | Digital/Communication/Information Technologies (including Software) |
Impact Types | Economic |
Description | EPSRC responsive mode grant |
Amount | £672,083 (GBP) |
Funding ID | EP/R011605/1 |
Organisation | Engineering and Physical Sciences Research Council (EPSRC) |
Sector | Public |
Country | United Kingdom |
Start | 01/2018 |
End | 12/2020 |
Description | EPSRC responsive mode grant |
Amount | £286,470 (GBP) |
Funding ID | EP/N007166/1 |
Organisation | Engineering and Physical Sciences Research Council (EPSRC) |
Sector | Public |
Country | United Kingdom |
Start | 01/2016 |
End | 01/2019 |
Description | ERC Consolidator Grant |
Amount | € 1,955,129 (EUR) |
Organisation | European Commission |
Sector | Public |
Country | European Union (EU) |
Start | 10/2019 |
End | 09/2024 |
Title | Covrig data set |
Description | Data associated with an empirical study examining how code and tests co-evolve in 6 popular open-source systems. We report the main characteristics of software patches, analyse the evolution of program and patch coverage, assess the impact of nondeterminism on the execution of the test suite, and investigate whether the coverage of code containing bugs and bug fixes is higher than average. |
Type Of Material | Database/Collection of data |
Year Produced | 2014 |
Provided To Others? | Yes |
Impact | The associated paper was published in the International Symposium on Software Testing and Analysis (ISSTA 2014). The associated artifact has won the Best Artifact Award at the conference. |
URL | http://srg.doc.ic.ac.uk/projects/covrig/ |
Title | Floating-point challenge benchmarks |
Description | Our research on symbolic execution of floating-point programs resulted in a set of 35k floating-point SMT queries generated by our tools, which we contributed to the annual competition on SMT solvers, SMT-COMP (https://smt-comp.github.io/). |
Type Of Material | Database/Collection of data |
Year Produced | 2018 |
Provided To Others? | Yes |
Impact | TBD |
Title | Studying, finding, and localizing inconsistency bugs in PDF readers and files |
Description | The database contains the results of a large scale study on the correctness of PDF documents and PDF document readers, based on clustering errors emitted by the PDF readers and visual inconsistencies detection. |
Type Of Material | Database/Collection of data |
Year Produced | 2018 |
Provided To Others? | Yes |
Impact | The associated paper was published in the Empirical Software Engineering journal and was downloaded over 1000 times since. |
URL | https://srg.doc.ic.ac.uk/projects/pdf-errors/ |
Description | Collaboration with Purdue University |
Organisation | Purdue University |
Country | United States |
Sector | Academic/University |
PI Contribution | This project led to a close collaboration with Purdue University, which resulted in a joint publication at ISSTA 2017. |
Collaborator Contribution | This project led to a close collaboration with David M. Perry and Xiangyu Zhang from Purdue University, which resulted in a joint publication at ISSTA 2017. |
Impact | Joint paper "Accelerating Array Constraints in Symbolic Execution" by David M. Perry, Andrea Mattavelli, Xiangyu Zhang, Cristian Cadar at the International Symposium on Software Testing and Analysis (ISSTA 2017), and an associated software implementation. |
Start Year | 2016 |
Description | Collaboration with Tel Aviv University |
Organisation | Tel Aviv University |
Country | Israel |
Sector | Academic/University |
PI Contribution | This project led to a close collaboration with Tel Aviv University, which resulted in a joint publication at ICSE 2018. |
Collaborator Contribution | This project led to a close collaboration with Tel Aviv University, which resulted in a joint publication at ICSE 2018. |
Impact | Joint paper "Chopped Symbolic Execution" by David Trabish, Andrea Mattavelli, Noam Rinetzky, Cristian Cadar at the International Conference on Software Engineering (ICSE 2018). |
Start Year | 2017 |
Description | Collaboration with University of Maryland at College Park |
Organisation | University of Maryland, College Park |
Country | United States |
Sector | Academic/University |
PI Contribution | This project led to a close collaboration with University of Maryland at College Park, which resulted in a joint publication at ASPLOS 2019. |
Collaborator Contribution | This project led to a close collaboration with University of Maryland at College Park, which resulted in a joint publication at ASPLOS 2019. |
Impact | Joint paper "Mvedsua: Higher Availability Dynamic Software Updates via Multi-Version Execution" by Luís Pina, Anastasios Andronidis, Michael Hicks, Cristian Cadar, International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2019) |
Start Year | 2017 |
Title | Infrastructure for Testing Symbolic Execution Engines |
Description | A tool for testing symbolic execution engines based on program generation and differential testing of compilers. |
Type Of Technology | Software |
Year Produced | 2017 |
Open Source License? | Yes |
Impact | Found serious bugs in the popular symbolic execution engines KLEE, CREST and FuzzBall. |
URL | https://srg.doc.ic.ac.uk/projects/symex-tester/ |
Title | KLEE-Arrays |
Description | An extension to the KLEE symbolic execution engine that implements a set of semantic-preserving transformations that take advantage of contextual information collected during symbolic execution to speedup array operations. The extension is now part of the KLEE mainline. |
Type Of Technology | Software |
Year Produced | 2017 |
Open Source License? | Yes |
Impact | The extension leads to significant speedups in the symbolic execution of popular benchmarks. |
URL | https://srg.doc.ic.ac.uk/projects/klee-array/ |
Title | SaBre |
Description | SaBRe is a modular selective binary rewriter, specialized for rewriting system calls, vDSO and named functions. It currently supports two architectures, x86_64 and RISC-V. |
Type Of Technology | Software |
Year Produced | 2019 |
Open Source License? | Yes |
Impact | SaBre was successfully used to implement a better version of our multi-version execution system Varan, as well as a lightweight fault injector and a fast system call tracer. It was selected for presentation at FOSDEM 2020, attended by thousands of open-source software developers. |
Title | Segmented Memory Model for KLEE |
Description | Symbolic execution is an effective technique for exploring paths in a program and reasoning about all possible values on those paths. However, the technique still struggles with code that uses complex heap data structures, in which a pointer is allowed to refer to more than one memory object. In such cases, symbolic execution typically forks execution into multiple states, one for each object to which the pointer could refer. This extension avoids this expensive forking by using a segmented memory model. In this model, memory is split into segments, so that each symbolic pointer refers to objects in a single segment. The size of the segments are bound by a threshold, in order to avoid expensive constraints. This results in a memory model where forking due to symbolic pointer dereferences is significantly reduced, often completely. |
Type Of Technology | Software |
Year Produced | 2019 |
Open Source License? | Yes |
Impact | The associated research was published in the ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2019), one of the top software engineering venues, and the artifact was validated as reusable by the ESEC/FSE 2019 Artifact Evaluation Committee. |
URL | https://srg.doc.ic.ac.uk/projects/klee-segmem/ |
Title | Shadow |
Description | A symbolic execution tool that is designed to generate test inputs that cover the new program behaviours introduced by a patch. The technique works by executing both the old and the new version in the same symbolic execution instance, with the old version shadowing the new one. During this combined shadow execution, whenever a branch point is reached where the old and the new versions diverge, we generate a test case exercising the divergence and comprehensively test the new behaviours of the new version. |
Type Of Technology | Software |
Year Produced | 2016 |
Impact | The associated research was published in the International Conference on Software Engineering (ICSE 2016) and ACM Transactions on Software Engineering and Methodology (TOSEM 2018). IEEE has invited us to write a blog post which discusses this work. The work has attracted considerable attention, being reimplemented by another research group for Java, used for hybrid differential testing, and discussed in the context of research on determining the correlation between fault revelation and various forms of coverage; combining bug detection with test suite augmentation; and genetic improvement. |
URL | http://srg.doc.ic.ac.uk/projects/shadow/ |
Title | Symbooglix |
Description | A symbolic execution engine for the Boogie intermediate verification language which provides modular components that can be reused in other Boogie projects. |
Type Of Technology | Software |
Year Produced | 2016 |
Open Source License? | Yes |
Impact | Associated paper accepted at the IEEE International Conference on Software Testing, Verification, and Validation (ICST 2016) and received a Best Paper Award. |
URL | http://srg.doc.ic.ac.uk/projects/symbooglix/ |
Title | Varan |
Description | A multi-version execution framework that combines selective binary rewriting with a novel event-streaming architecture to significantly reduce performance overhead and scale well with the number of versions, without relying on intrusive kernel modifications. |
Type Of Technology | Software |
Year Produced | 2015 |
Impact | The framework is now a reference for work in this area, and its architecture was incorporated in the frameworks of other groups. |
URL | https://srg.doc.ic.ac.uk/projects/varan/ |
Description | Blog post |
Form Of Engagement Activity | Engagement focused website, blog or social media channel |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Blog post co-authored by Cristian Cadar, Luis Pina and John Regehr about preventing compiler-bug-based backdoors using mullti-version execution. This was posted on Cristian Cadar and John Regehr's blogs, with the latter being one of the most popular blogs for audiences interested in compilers. |
Year(s) Of Engagement Activity | 2015 |
URL | https://blog.regehr.org/archives/1282 |
Description | Co-organiser of Dagstuhl seminar on Bringing CP, SAT and SMT together: Next Challenges in Constraint Solving |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Other audiences |
Results and Impact | Co-organized a Dagstuhl seminar on the next challenges in constraint solving, which brought together academic and industry participants from the CP, SAT and SMT, and constraint solving applications communities. |
Year(s) Of Engagement Activity | 2019 |
URL | https://www.dagstuhl.de/en/program/calendar/semhp/?semnr=19062 |
Description | Co-organizer of Shonan Meeting on Fuzzing and Symbolic Execution: Reflections, Challenges, and Opportunities |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Other audiences |
Results and Impact | Co-organized a Shonan meeting on fuzzing and symbolic execution, which brought together academic and industry participants from the fuzzing and symbolic execution communities, from across the globe. |
Year(s) Of Engagement Activity | 2019 |
URL | https://shonan.nii.ac.jp/seminars/160/ |
Description | FOSDEM talk and interview |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Talk, video and interview delivered at FOSDEM 2020, an event gathering thousands of developers of free and open source software. https://fosdem.org/2020/interviews/paul-antoine-arras/ https://fosdem.org/2020/schedule/event/sabre/ |
Year(s) Of Engagement Activity | 2020 |
URL | https://fosdem.org/2020/interviews/paul-antoine-arras/ |
Description | IEEE Software Blog |
Form Of Engagement Activity | Engagement focused website, blog or social media channel |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | IEEE Software blog post on our research on comprehensively testing software patches with symbolic execution. |
Year(s) Of Engagement Activity | 2016 |
URL | http://blog.ieeesoftware.org/2016/09/comprehensively-testing-software.html |
Description | INVEST workshop |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | Regional |
Primary Audience | Undergraduate students |
Results and Impact | The Introduction to Verification and Testing (INVEST) workshop takes place in the Department of Computing, Imperial College London on an annual basis, jointly organised by Cadar, Donaldson and Gardner. The aim of the workshop is to introduce young researchers, and students potentially interested in research careers, to the fields of software verification and testing. |
Year(s) Of Engagement Activity | 2014,2015,2017 |
URL | http://invest.doc.ic.ac.uk |
Description | Outreach to school groups |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | Local |
Primary Audience | Schools |
Results and Impact | Outreach talk to schools on software bugs are possible mitigations. |
Year(s) Of Engagement Activity | 2019 |
URL | https://www.eventbrite.co.uk/e/software-bugs-why-they-exist-and-how-to-fight-them-tickets-6164483435... |