Improving Symbolic Execution via Targeted Program Transformations

Lead Research Organisation: Imperial College London

Department Name: Computing

Abstract

Dynamic symbolic execution (DSE) has gained tremendous popularity in the last decade, becoming part of the standard toolbox of techniques in many computer science fields including software engineering, programming languages, software testing, verification, security, and computer systems. The technique has enabled a wide range of applications, including the automatic detection of bugs and security vulnerabilities, recovery of corrupt documents, patch generation, and automatic debugging, among many others.

The effectiveness and scalability of DSE is highly dependent on the structure of the program. That is, semantically-equivalent programs can differ substantially with respect to the effectiveness of DSE to explore the program state space. As a result, this project aims to discover and design automatic semantics-preserving program transformations that improve the scalability of DSE.

Planned Impact

The primary non-academic beneficiaries of the proposed research are software companies and the open-source software community. It is well known that software bugs have a large negative impact on the global economy. For example, a report published by the US National Institute of Standards and Technology (NIST) in 2002 found that software bugs cost annually the US economy an estimated $60 billion dollars or 0.6 percent of the gross domestic product, and a recent study from the University of Cambridge estimated the worldwide cost of debugging at $312 billion dollars per year.

As a result, we are witnessing a sustained effort directed toward developing automatic techniques for finding software errors. The current generation of commercial tools use static analysis technology, i.e. they try to find bugs by inspecting the program source code, without actually running the program. While helpful, static analysis is limited in the types of bugs that it can find (e.g., none of the static analysis products on the market today could find the recent Heartbleed vulnerability) and produces false reports, which limits its usefulness. For an additional overhead many times acceptable in practice, dynamic symbolic execution (DSE) can often overcome these limitations, and has already shown promise in industry, with many corporations such as IBM, Fujitsu and Microsoft using it internally. However, DSE still presents significant scalability challenges, which prevent its widespread adoption. This research project could bring significant scalability improvements to DSE, potentially leading to large reductions in the costs associated with software errors and debugging.

Our goal is to disseminate the results of this research from an early stage onwards through a variety of mechanisms: (1) by open-sourcing most of our code to be used directly by software developers on their code; (2) by applying these prototypes to open-source applications to demonstrate the effectiveness of our approach; (3) by communicating the results of our research to the non-academic user community via meetings, extended visits and an engagement workshop; (4) by considering commercialisation opportunities and (5) by teaching and training.

Funded Value:

£286,469

Funded Period:

Jan 16 - Nov 19

Funder:

EPSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

EP/N007166/1

Principal Investigator:

Cristian Cadar

Research Subject:

Info. & commun. Technol. (100%)

Research Topic:

Fundamentals of Computing (50%)

Software Engineering (50%)

Organisations

People	ORCID iD
Cristian Cadar (Principal Investigator)

Publications

Author Name

Title Publication Date Published

10 25 50

Braione P (2017) Combining symbolic execution and search-based testing for programs with complex heap inputs

Busse F (2020) Running symbolic execution forever

Kapus T (2019) A segmented memory model for symbolic execution

Kapus T (2019) Tests and Proofs - 13th International Conference, TAP 2019, Held as Part of the Third World Congress on Formal Methods 2019, Porto, Portugal, October 9-11, 2019, Proceedings

Kapus T (2019) Computing summaries of string loops in C for better testing and refactoring

Liew D (2019) Just fuzz it: solving floating-point constraints using coverage-guided fuzzing

Nowack M (2019) Fine-Grain Memory Object Representation in Symbolic Execution

Perry D (2017) Accelerating array constraints in symbolic execution

Trabish D (2018) Chopped symbolic execution

Trabish D. (2018) Chopped Symbolic Execution

Key Findings
Impact Summary
Collaboration
Software and Technical Products
Engagement Activities


Description	Research supported by this grant has resulted in an effective technique for accelerating symbolic execution for programs that use large arrays. In turn, this enables faster testing of complex programs and allows the testing of program paths that would have been otherwise unreachable during symbolic execution. More recent work focused on a novel form of symbolic execution called chopped symbolic execution which allows users to specify uninteresting parts of the code to exclude during the analysis, thus only targeting the exploration to paths of importance. However, the excluded parts are not summarily ignored, and instead are executed lazily, when their effect may be observable by code under analysis. The grant has also enabled research into a new memory model for symbolic execution, which can lead to better testing of applications with complex data structures. The grant has also supported research into summarising loops, with applications in testing and refactoring. Finally, the grant has led to techniques for testing programs with symbolic execution for long amounts of time, opening the potential for finding deeper program bugs.
Exploitation Route	Symbolic execution has gathered significant attention from both academia and industry as an effective way to test and verify software. A major bottleneck in symbolic execution has been constraint solving, which is slow, particularly when large arrays are involved. Our technique for accelerating array constraints partially addresses this bottleneck and allows researchers and developers to use symbolic execution for larger and more complex programs. Similarly, our technique of summarising loops can speed up symbolic execution via the use of fast string solvers. The other major bottleneck is related to path explosion. Our chopped symbolic execution technique addresses this problem by allowing the analysis to focus on paths of interest, which could allow users to efficiently employ symbolic execution to more application domains. Finally, our technique for running symbolic execution for long amounts of time can be use to alleviate both memory pressure and path explosion.
Sectors	Digital/Communication/Information Technologies (including Software)
URL	https://srg.doc.ic.ac.uk/projects/


Description	Our work on semantic-preserving transformations that take advantage of contextual information collected during symbolic execution to speed up array operations, as well as several aspects of our other prototypes have now been integrated into KLEE, a popular symbolic execution engine developed and maintained in our group, with a large userbase in both academia and industry. Our approach for summarizing loops has led to several refactorings that were contributed to popular open-source projects.
First Year Of Impact	2018
Sector	Digital/Communication/Information Technologies (including Software)
Impact Types	Economic


Description	Collaboration with Purdue University
Organisation	Purdue University
Country	United States
Sector	Academic/University
PI Contribution	This project led to a close collaboration with Purdue University, which resulted in a joint publication at ISSTA 2017.
Collaborator Contribution	This project led to a close collaboration with David M. Perry and Xiangyu Zhang from Purdue University, which resulted in a joint publication at ISSTA 2017.
Impact	Joint paper "Accelerating Array Constraints in Symbolic Execution" by David M. Perry, Andrea Mattavelli, Xiangyu Zhang, Cristian Cadar at the International Symposium on Software Testing and Analysis (ISSTA 2017), and an associated software implementation.
Start Year	2016


Description	Collaboration with Tel Aviv University
Organisation	Tel Aviv University
Country	Israel
Sector	Academic/University
PI Contribution	This project led to a close collaboration with Tel Aviv University, which resulted in a joint publication at ICSE 2018.
Collaborator Contribution	This project led to a close collaboration with Tel Aviv University, which resulted in a joint publication at ICSE 2018.
Impact	Joint paper "Chopped Symbolic Execution" by David Trabish, Andrea Mattavelli, Noam Rinetzky, Cristian Cadar at the International Conference on Software Engineering (ICSE 2018).
Start Year	2017


Title	Artefact for the ISSTA 2020 Paper: Running Symbolic Execution Forever
Description	Abstract When symbolic execution is used to analyse real-world applications, it often consumes all available memory in a relatively short amount of time, sometimes making it impossible to analyse an application for an extended period. In this paper, we present a technique that can record an ongoing symbolic execution analysis to disk and selectively restore paths of interest later, making it possible to run symbolic execution indefinitely. To be successful, our approach addresses several essential research challenges related to detecting divergences on re-execution, storing long-running executions efficiently, changing search heuristics during re-execution, and providing a global view of the stored execution. Our extensive evaluation of 93 Linux applications shows that our approach is practical, enabling these applications to run for days while continuing to explore new execution paths. Artefact The artefact contains a Docker image with a compiled version of MoKlee, the benchmark programs as LLVM bitcode, all experiment results, and the necessary scripts to reproduce our study or apply MoKlee to different applications. Project page: https://srg.doc.ic.ac.uk/projects/moklee/
Type Of Technology	Software
Year Produced	2020
Open Source License?	Yes
URL	https://zenodo.org/record/3895270


Title	Artefact for the ISSTA 2020 Paper: Running Symbolic Execution Forever
Description	Abstract When symbolic execution is used to analyse real-world applications, it often consumes all available memory in a relatively short amount of time, sometimes making it impossible to analyse an application for an extended period. In this paper, we present a technique that can record an ongoing symbolic execution analysis to disk and selectively restore paths of interest later, making it possible to run symbolic execution indefinitely. To be successful, our approach addresses several essential research challenges related to detecting divergences on re-execution, storing long-running executions efficiently, changing search heuristics during re-execution, and providing a global view of the stored execution. Our extensive evaluation of 93 Linux applications shows that our approach is practical, enabling these applications to run for days while continuing to explore new execution paths. Artefact The artefact contains a Docker image with a compiled version of MoKlee, the benchmark programs as LLVM bitcode, all experiment results, and the necessary scripts to reproduce our study or apply MoKlee to different applications. Project page: https://srg.doc.ic.ac.uk/projects/moklee/
Type Of Technology	Software
Year Produced	2020
Open Source License?	Yes
URL	https://zenodo.org/record/3895271


Title	KLEE with Integer Extension
Description	An extension to KLEE that provides an integer solver based on Z3. Solver queries that do not contain bitvector instructions are forwarded to an integer solver instead of a bitvector solver.
Type Of Technology	Software
Year Produced	2019
Open Source License?	Yes
Impact	The associated research was published in the International Conference on Tests and Proofs (TAP 2019). The software was used to explore the impact of using the theory of integers on the precision and performance of dynamic symbolic execution of C programs, a topic which was discussed by constraint solving and symbolic execution experts at the 2019 Dagstuhl seminar on "Bringing CP, SAT and SMT together: Next Challenges in Constraint Solving
URL	https://srg.doc.ic.ac.uk/projects/klee-z3-int-vs-bv/


Title	KLEE-Arrays
Description	An extension to the KLEE symbolic execution engine that implements a set of semantic-preserving transformations that take advantage of contextual information collected during symbolic execution to speedup array operations. The extension is now part of the KLEE mainline.
Type Of Technology	Software
Year Produced	2017
Open Source License?	Yes
Impact	The extension leads to significant speedups in the symbolic execution of popular benchmarks.
URL	https://srg.doc.ic.ac.uk/projects/klee-array/


Title	Loop summarization tool
Description	Analysing and comprehending C programs that use strings is hard: Using standard library functions for manipulating strings is not enforced and programs often use complex loops for the same purpose. We introduce the notion of memoryless loops that capture some of these string loops. The tool uses a counterexample-guided inductive synthesis approach to summarise memoryless string loops using C standard library functions.
Type Of Technology	Software
Year Produced	2019
Open Source License?	Yes
Impact	The associated research was published in the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2019), one of the top programming languages venues, and the artifact was validated as functional by the PLDI 2019 Artifact Evaluation Committee. The software was used to generate refactoring patches, which were successfully contributed to several popular open-source software systems.
URL	https://srg.doc.ic.ac.uk/projects/loop-summaries/


Title	Segmented Memory Model for KLEE
Description	Symbolic execution is an effective technique for exploring paths in a program and reasoning about all possible values on those paths. However, the technique still struggles with code that uses complex heap data structures, in which a pointer is allowed to refer to more than one memory object. In such cases, symbolic execution typically forks execution into multiple states, one for each object to which the pointer could refer. This extension avoids this expensive forking by using a segmented memory model. In this model, memory is split into segments, so that each symbolic pointer refers to objects in a single segment. The size of the segments are bound by a threshold, in order to avoid expensive constraints. This results in a memory model where forking due to symbolic pointer dereferences is significantly reduced, often completely.
Type Of Technology	Software
Year Produced	2019
Open Source License?	Yes
Impact	The associated research was published in the ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2019), one of the top software engineering venues, and the artifact was validated as reusable by the ESEC/FSE 2019 Artifact Evaluation Committee.
URL	https://srg.doc.ic.ac.uk/projects/klee-segmem/


Description	Co-organiser of Dagstuhl seminar on Bringing CP, SAT and SMT together: Next Challenges in Constraint Solving
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Other audiences
Results and Impact	Co-organized a Dagstuhl seminar on the next challenges in constraint solving, which brought together academic and industry participants from the CP, SAT and SMT, and constraint solving applications communities.
Year(s) Of Engagement Activity	2019
URL	https://www.dagstuhl.de/en/program/calendar/semhp/?semnr=19062


Description	Co-organizer of Shonan Meeting on Fuzzing and Symbolic Execution: Reflections, Challenges, and Opportunities
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Other audiences
Results and Impact	Co-organized a Shonan meeting on fuzzing and symbolic execution, which brought together academic and industry participants from the fuzzing and symbolic execution communities, from across the globe.
Year(s) Of Engagement Activity	2019
URL	https://shonan.nii.ac.jp/seminars/160/


Description	INVEST workshop
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	Regional
Primary Audience	Undergraduate students
Results and Impact	The Introduction to Verification and Testing (INVEST) workshop takes place in the Department of Computing, Imperial College London on an annual basis, jointly organised by Cadar, Donaldson and Gardner. The aim of the workshop is to introduce young researchers, and students potentially interested in research careers, to the fields of software verification and testing.
Year(s) Of Engagement Activity	2014,2015,2017
URL	http://invest.doc.ic.ac.uk


Description	Outreach to school groups
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Local
Primary Audience	Schools
Results and Impact	Outreach talk to schools on software bugs are possible mitigations.
Year(s) Of Engagement Activity	2019
URL	https://www.eventbrite.co.uk/e/software-bugs-why-they-exist-and-how-to-fight-them-tickets-6164483435...

Abstract

Planned Impact

Organisations

People

ORCID iD

Publications