Multi-version Execution Techniques for Increasing the Reliability and Security of Evolving Software

Lead Research Organisation: Imperial College London

Department Name: Computing

Abstract

One of the distinguishing characteristics of software systems is that they evolve: new patches are committed to software repositories and new versions are released to users on a continuous basis. Unfortunately, many of these changes bring unexpected failures that break the stability of the system or affect its security, and users face the uncomfortable choice between using an old stable version which misses recent features and bug fixes, and upgrading to a new version which improves the software in certain ways, only to introduce other bugs and security vulnerabilities.

In this fellowship, I plan to investigate novel techniques for improving the reliability and security of evolving software, based on the idea of combining the execution of multiple software versions in such a way as to increase the reliability and security of the "multi-version" application and eliminate a large number of common bugs introduced by software updates.

This is an ambitious proposal, which presents several challenges spanning the areas of software engineering, computer systems, and security: understanding how software evolves, and particularly the effects of incorrect updates on software evolution; addressing the technical challenges of multi-version execution such as creating an application-level sandboxing environment and devising lightweight record and replay techniques; designing error recovery strategies that effectively combine different software versions; and determining the applicability of multi-version execution to the different types of applications and code changes encountered in practice.

Planned Impact

The correct operation of today's software systems relies on the reliability and security of the software update process. As a result, this research could have a direct impact in practice, improving the productivity of the large number of computer users and system administrators that need to incorporate software updates on a regular basis. Furthermore, our approach could improve the security of software by preventing denial-of-service attacks exploiting security vulnerabilities in applications with high-availability requirements such as network servers, as well as by encouraging users to incorporate more quickly the latest security fixes.

While the immediate societal benefits will be centred around improving the quality and security of software updates, many of the techniques developed in this work could lead to significant benefits in other areas related to software quality. For example, our lightweight replay-and-record techniques could be used to send easily reproducible error reports to developers, while our research on parallel multi-version execution could be used to deploy high-interaction honeypots offering a fine-grained understanding of the anatomy of an attack.

Our goal is to disseminate the results of this research from an early stage onwards through a variety of mechanisms: (1) by open-sourcing most of our initial prototypes to be used directly by software developers on their code; (2) by applying it to open-source and commercial code to demonstrate the effectiveness of our approach; and (3) by communicating the results of our research to the non-academic user community.

Funded Value:

£978,493

Funded Period:

Mar 14 - Jun 19

Funder:

EPSRC

Project Status:

Closed

Project Category:

Fellowship

Project Reference:

EP/L002795/1

Principal Investigator:

Cristian Cadar

Research Subject:

Info. & commun. Technol. (100%)

Research Topic:

Computer Sys. & Architecture (25%)

Software Engineering (75%)

Organisations

People	ORCID iD
Cristian Cadar (Principal Investigator / Fellow)

Publications

Author Name Title Publication

Date Published

10 25 50

Hosek P (2015) VARAN the Unbelievable An Efficient N-version Execution Framework in ACM SIGPLAN Notices

Kuchta T (2018) Shadow Symbolic Execution for Testing Software Patches in ACM Transactions on Software Engineering and Methodology

Kuchta T (2018) On the correctness of electronic documents: studying, finding, and localizing inconsistency bugs in PDF readers and files in Empirical Software Engineering

Arras P (2022) SaBRe: load-time selective binary rewriting in International Journal on Software Tools for Technology Transfer

Kapus T (2019) A segmented memory model for symbolic execution

Cadar C (2015) Targeted program transformations for symbolic execution

Cadar, C (2016) Analysing the Program Analyser

Kapus T (2017) Automatic testing of symbolic execution engines via program generation and differential testing

Pina L (2018) FreeDA

Cadar C (2016) Analysing the program analyser

Palikareva, H (2016) Shadow of a Doubt: Testing for Divergences Between Software Versions

Cadar C (2014) Shadow symbolic execution for better testing of evolving software

Pina L (2015) Towards deployment-time dynamic analysis of server applications

Liew D (2016) Symbooglix: A Symbolic Execution Engine for Boogie Programs

Pina L. (2019) Mvedsua: Higher Availability Dynamic Software Updates via Multi-Version Execution

Liew D (2017) Floating-point symbolic execution: A case study in N-version programming

Kuchta T (2014) Docovery

Marinescu P (2014) Covrig: a framework for the analysis of code, test, and coverage evolution in real software

Perry D (2017) Accelerating array constraints in symbolic execution

Trabish D (2018) Chopped symbolic execution

Pina L (2017) A DSL Approach to Reconcile Equivalent Divergent Program Executions

Dustmann O (2018) PARTI: a multi-interval theory solver for symbolic execution

Key Findings
Impact Summary
Further Funding
Research Databases and Models
Collaboration
Software and Technical Products
Engagement Activities


Description	This grant has enabled the design of novel techniques for increasing the reliability and security of evolving software. We have created a novel multi-version execution system that allows multiple versions or variants of a software application to run side-by-side with a low performance overhead. We have built several new techniques on top of this system, including running several versions of the same application in order to survive bugs introduced by software updates; running different variants of the same application created by different compilers, in order to detect compiler-bug-based backdoors; running sanitized versions of an application in parallel with the native version in order to detect software bugs. We have also designed a novel dynamic binary rewriter, and implemented several plugins on top of it, such as a lightweight fault injector and fast system call tracer. Our Mvedsua project has combined multi-version execution with dynamic software updating to create more reliable dynamic software updates. In addition, the grant has supported research into using dynamic symbolic execution to find bugs in software applications, an empirical study for understanding how large software projects evolve over time, a study on the correctness of electronic documents: studying, finding, and localizing inconsistency bugs in PDF readers and files; the design of taint tracking and symbolic execution techniques for recovering broken documents, and a new memory model for effectively handling symbolic pointers.
Exploitation Route	The multi-version execution techniques we designed could be used to improve the software update process, by enabling transparent fail-over in the case of update errors. They could also be used to find bugs in deployed applications, and detect compiler-bug-based backdoors. The multi-version execution framework that we designed as part of this project is now a reference for work in this area, and parts of its architecture were incorporated in the frameworks of other groups. Our symbolic execution techniques have helped increase the scalability of the technique, and can be used to find important bugs and security vulnerabilities, including in evolving applications via patch testing. Our symbolic execution framework KLEE, on top of which most of these techniques are implemented, continues to be one of the most popular symbolic execution platforms, with a large number of users from both academia and industry.
Sectors	Digital/Communication/Information Technologies (including Software)
URL	http://srg.doc.ic.ac.uk/projects/


Description	Several of our techniques improved the testing of popular open-source software systems. Our symbolic execution techniques have been used by a number of companies, such as Baidu, Fujitsu and Samsung. Our binary rewriting system is available as open source and was presented at the popular FOSDEM 2020 event on open-source software.
First Year Of Impact	2015
Sector	Digital/Communication/Information Technologies (including Software)
Impact Types	Economic


Description	EPSRC responsive mode grant
Amount	£672,083 (GBP)
Funding ID	EP/R011605/1
Organisation	Engineering and Physical Sciences Research Council (EPSRC)
Sector	Public
Country	United Kingdom
Start	01/2018
End	12/2020


Description	EPSRC responsive mode grant
Amount	£286,470 (GBP)
Funding ID	EP/N007166/1
Organisation	Engineering and Physical Sciences Research Council (EPSRC)
Sector	Public
Country	United Kingdom
Start	01/2016
End	01/2019


Description	ERC Consolidator Grant
Amount	€ 1,955,129 (EUR)
Organisation	European Commission
Sector	Public
Country	European Union (EU)
Start	10/2019
End	09/2024


Title	Covrig data set
Description	Data associated with an empirical study examining how code and tests co-evolve in 6 popular open-source systems. We report the main characteristics of software patches, analyse the evolution of program and patch coverage, assess the impact of nondeterminism on the execution of the test suite, and investigate whether the coverage of code containing bugs and bug fixes is higher than average.
Type Of Material	Database/Collection of data
Year Produced	2014
Provided To Others?	Yes
Impact	The associated paper was published in the International Symposium on Software Testing and Analysis (ISSTA 2014). The associated artifact has won the Best Artifact Award at the conference.
URL	http://srg.doc.ic.ac.uk/projects/covrig/


Title	Floating-point challenge benchmarks
Description	Our research on symbolic execution of floating-point programs resulted in a set of 35k floating-point SMT queries generated by our tools, which we contributed to the annual competition on SMT solvers, SMT-COMP (https://smt-comp.github.io/).
Type Of Material	Database/Collection of data
Year Produced	2018
Provided To Others?	Yes
Impact	TBD


Title	Studying, finding, and localizing inconsistency bugs in PDF readers and files
Description	The database contains the results of a large scale study on the correctness of PDF documents and PDF document readers, based on clustering errors emitted by the PDF readers and visual inconsistencies detection.
Type Of Material	Database/Collection of data
Year Produced	2018
Provided To Others?	Yes
Impact	The associated paper was published in the Empirical Software Engineering journal and was downloaded over 1000 times since.
URL	https://srg.doc.ic.ac.uk/projects/pdf-errors/


Description	Collaboration with Purdue University
Organisation	Purdue University
Country	United States
Sector	Academic/University
PI Contribution	This project led to a close collaboration with Purdue University, which resulted in a joint publication at ISSTA 2017.
Collaborator Contribution	This project led to a close collaboration with David M. Perry and Xiangyu Zhang from Purdue University, which resulted in a joint publication at ISSTA 2017.
Impact	Joint paper "Accelerating Array Constraints in Symbolic Execution" by David M. Perry, Andrea Mattavelli, Xiangyu Zhang, Cristian Cadar at the International Symposium on Software Testing and Analysis (ISSTA 2017), and an associated software implementation.
Start Year	2016


Description	Collaboration with Tel Aviv University
Organisation	Tel Aviv University
Country	Israel
Sector	Academic/University
PI Contribution	This project led to a close collaboration with Tel Aviv University, which resulted in a joint publication at ICSE 2018.
Collaborator Contribution	This project led to a close collaboration with Tel Aviv University, which resulted in a joint publication at ICSE 2018.
Impact	Joint paper "Chopped Symbolic Execution" by David Trabish, Andrea Mattavelli, Noam Rinetzky, Cristian Cadar at the International Conference on Software Engineering (ICSE 2018).
Start Year	2017


Description	Collaboration with University of Maryland at College Park
Organisation	University of Maryland, College Park
Country	United States
Sector	Academic/University
PI Contribution	This project led to a close collaboration with University of Maryland at College Park, which resulted in a joint publication at ASPLOS 2019.
Collaborator Contribution	This project led to a close collaboration with University of Maryland at College Park, which resulted in a joint publication at ASPLOS 2019.
Impact	Joint paper "Mvedsua: Higher Availability Dynamic Software Updates via Multi-Version Execution" by Luís Pina, Anastasios Andronidis, Michael Hicks, Cristian Cadar, International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2019)
Start Year	2017


Title	Infrastructure for Testing Symbolic Execution Engines
Description	A tool for testing symbolic execution engines based on program generation and differential testing of compilers.
Type Of Technology	Software
Year Produced	2017
Open Source License?	Yes
Impact	Found serious bugs in the popular symbolic execution engines KLEE, CREST and FuzzBall.
URL	https://srg.doc.ic.ac.uk/projects/symex-tester/


Title	KLEE-Arrays
Description	An extension to the KLEE symbolic execution engine that implements a set of semantic-preserving transformations that take advantage of contextual information collected during symbolic execution to speedup array operations. The extension is now part of the KLEE mainline.
Type Of Technology	Software
Year Produced	2017
Open Source License?	Yes
Impact	The extension leads to significant speedups in the symbolic execution of popular benchmarks.
URL	https://srg.doc.ic.ac.uk/projects/klee-array/


Title	SaBre
Description	SaBRe is a modular selective binary rewriter, specialized for rewriting system calls, vDSO and named functions. It currently supports two architectures, x86_64 and RISC-V.
Type Of Technology	Software
Year Produced	2019
Open Source License?	Yes
Impact	SaBre was successfully used to implement a better version of our multi-version execution system Varan, as well as a lightweight fault injector and a fast system call tracer. It was selected for presentation at FOSDEM 2020, attended by thousands of open-source software developers.


Title	Segmented Memory Model for KLEE
Description	Symbolic execution is an effective technique for exploring paths in a program and reasoning about all possible values on those paths. However, the technique still struggles with code that uses complex heap data structures, in which a pointer is allowed to refer to more than one memory object. In such cases, symbolic execution typically forks execution into multiple states, one for each object to which the pointer could refer. This extension avoids this expensive forking by using a segmented memory model. In this model, memory is split into segments, so that each symbolic pointer refers to objects in a single segment. The size of the segments are bound by a threshold, in order to avoid expensive constraints. This results in a memory model where forking due to symbolic pointer dereferences is significantly reduced, often completely.
Type Of Technology	Software
Year Produced	2019
Open Source License?	Yes
Impact	The associated research was published in the ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2019), one of the top software engineering venues, and the artifact was validated as reusable by the ESEC/FSE 2019 Artifact Evaluation Committee.
URL	https://srg.doc.ic.ac.uk/projects/klee-segmem/


Title	Shadow
Description	A symbolic execution tool that is designed to generate test inputs that cover the new program behaviours introduced by a patch. The technique works by executing both the old and the new version in the same symbolic execution instance, with the old version shadowing the new one. During this combined shadow execution, whenever a branch point is reached where the old and the new versions diverge, we generate a test case exercising the divergence and comprehensively test the new behaviours of the new version.
Type Of Technology	Software
Year Produced	2016
Impact	The associated research was published in the International Conference on Software Engineering (ICSE 2016) and ACM Transactions on Software Engineering and Methodology (TOSEM 2018). IEEE has invited us to write a blog post which discusses this work. The work has attracted considerable attention, being reimplemented by another research group for Java, used for hybrid differential testing, and discussed in the context of research on determining the correlation between fault revelation and various forms of coverage; combining bug detection with test suite augmentation; and genetic improvement.
URL	http://srg.doc.ic.ac.uk/projects/shadow/


Title	Symbooglix
Description	A symbolic execution engine for the Boogie intermediate verification language which provides modular components that can be reused in other Boogie projects.
Type Of Technology	Software
Year Produced	2016
Open Source License?	Yes
Impact	Associated paper accepted at the IEEE International Conference on Software Testing, Verification, and Validation (ICST 2016) and received a Best Paper Award.
URL	http://srg.doc.ic.ac.uk/projects/symbooglix/


Title	Varan
Description	A multi-version execution framework that combines selective binary rewriting with a novel event-streaming architecture to significantly reduce performance overhead and scale well with the number of versions, without relying on intrusive kernel modifications.
Type Of Technology	Software
Year Produced	2015
Impact	The framework is now a reference for work in this area, and its architecture was incorporated in the frameworks of other groups.
URL	https://srg.doc.ic.ac.uk/projects/varan/


Description	Blog post
Form Of Engagement Activity	Engagement focused website, blog or social media channel
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Blog post co-authored by Cristian Cadar, Luis Pina and John Regehr about preventing compiler-bug-based backdoors using mullti-version execution. This was posted on Cristian Cadar and John Regehr's blogs, with the latter being one of the most popular blogs for audiences interested in compilers.
Year(s) Of Engagement Activity	2015
URL	https://blog.regehr.org/archives/1282


Description	Co-organiser of Dagstuhl seminar on Bringing CP, SAT and SMT together: Next Challenges in Constraint Solving
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Other audiences
Results and Impact	Co-organized a Dagstuhl seminar on the next challenges in constraint solving, which brought together academic and industry participants from the CP, SAT and SMT, and constraint solving applications communities.
Year(s) Of Engagement Activity	2019
URL	https://www.dagstuhl.de/en/program/calendar/semhp/?semnr=19062


Description	Co-organizer of Shonan Meeting on Fuzzing and Symbolic Execution: Reflections, Challenges, and Opportunities
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Other audiences
Results and Impact	Co-organized a Shonan meeting on fuzzing and symbolic execution, which brought together academic and industry participants from the fuzzing and symbolic execution communities, from across the globe.
Year(s) Of Engagement Activity	2019
URL	https://shonan.nii.ac.jp/seminars/160/


Description	FOSDEM talk and interview
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Talk, video and interview delivered at FOSDEM 2020, an event gathering thousands of developers of free and open source software. https://fosdem.org/2020/interviews/paul-antoine-arras/ https://fosdem.org/2020/schedule/event/sabre/
Year(s) Of Engagement Activity	2020
URL	https://fosdem.org/2020/interviews/paul-antoine-arras/


Description	IEEE Software Blog
Form Of Engagement Activity	Engagement focused website, blog or social media channel
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	IEEE Software blog post on our research on comprehensively testing software patches with symbolic execution.
Year(s) Of Engagement Activity	2016
URL	http://blog.ieeesoftware.org/2016/09/comprehensively-testing-software.html


Description	INVEST workshop
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	Regional
Primary Audience	Undergraduate students
Results and Impact	The Introduction to Verification and Testing (INVEST) workshop takes place in the Department of Computing, Imperial College London on an annual basis, jointly organised by Cadar, Donaldson and Gardner. The aim of the workshop is to introduce young researchers, and students potentially interested in research careers, to the fields of software verification and testing.
Year(s) Of Engagement Activity	2014,2015,2017
URL	http://invest.doc.ic.ac.uk


Description	Outreach to school groups
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Local
Primary Audience	Schools
Results and Impact	Outreach talk to schools on software bugs are possible mitigations.
Year(s) Of Engagement Activity	2019
URL	https://www.eventbrite.co.uk/e/software-bugs-why-they-exist-and-how-to-fight-them-tickets-6164483435...

Abstract

Planned Impact

Organisations

People

ORCID iD

Publications