Improving Symbolic Execution via Targeted Program Transformations

Lead Research Organisation: Imperial College London
Department Name: Computing

Abstract

Dynamic symbolic execution (DSE) has gained tremendous popularity in the last decade, becoming part of the standard toolbox of techniques in many computer science fields including software engineering, programming languages, software testing, verification, security, and computer systems. The technique has enabled a wide range of applications, including the automatic detection of bugs and security vulnerabilities, recovery of corrupt documents, patch generation, and automatic debugging, among many others.

The effectiveness and scalability of DSE is highly dependent on the structure of the program. That is, semantically-equivalent programs can differ substantially with respect to the effectiveness of DSE to explore the program state space. As a result, this project aims to discover and design automatic semantics-preserving program transformations that improve the scalability of DSE.

Planned Impact

The primary non-academic beneficiaries of the proposed research are software companies and the open-source software community. It is well known that software bugs have a large negative impact on the global economy. For example, a report published by the US National Institute of Standards and Technology (NIST) in 2002 found that software bugs cost annually the US economy an estimated $60 billion dollars or 0.6 percent of the gross domestic product, and a recent study from the University of Cambridge estimated the worldwide cost of debugging at $312 billion dollars per year.

As a result, we are witnessing a sustained effort directed toward developing automatic techniques for finding software errors. The current generation of commercial tools use static analysis technology, i.e. they try to find bugs by inspecting the program source code, without actually running the program. While helpful, static analysis is limited in the types of bugs that it can find (e.g., none of the static analysis products on the market today could find the recent Heartbleed vulnerability) and produces false reports, which limits its usefulness. For an additional overhead many times acceptable in practice, dynamic symbolic execution (DSE) can often overcome these limitations, and has already shown promise in industry, with many corporations such as IBM, Fujitsu and Microsoft using it internally. However, DSE still presents significant scalability challenges, which prevent its widespread adoption. This research project could bring significant scalability improvements to DSE, potentially leading to large reductions in the costs associated with software errors and debugging.

Our goal is to disseminate the results of this research from an early stage onwards through a variety of mechanisms: (1) by open-sourcing most of our code to be used directly by software developers on their code; (2) by applying these prototypes to open-source applications to demonstrate the effectiveness of our approach; (3) by communicating the results of our research to the non-academic user community via meetings, extended visits and an engagement workshop; (4) by considering commercialisation opportunities and (5) by teaching and training.
 
Description Research supported by this grant has resulted in an effective technique for accelerating symbolic execution for programs that use large arrays. In turn, this enables faster testing of complex programs and allows the testing of program paths that would have been otherwise unreachable during symbolic execution. More recent work focused on a novel form of symbolic execution called chopped symbolic execution which allows users to specify uninteresting parts of the code to exclude during the analysis, thus only targeting the exploration to paths of importance. However, the excluded parts are not summarily ignored, and instead are executed lazily, when their effect may be observable by code under analysis. The grant has also enabled research into a new memory model for symbolic execution, which can lead to better testing of applications with complex data structures. The grant has also supported research into summarising loops, with applications in testing and refactoring. Finally, the grant has led to techniques for testing programs with symbolic execution for long amounts of time, opening the potential for finding deeper program bugs.
Exploitation Route Symbolic execution has gathered significant attention from both academia and industry as an effective way to test and verify software. A major bottleneck in symbolic execution has been constraint solving, which is slow, particularly when large arrays are involved. Our technique for accelerating array constraints partially addresses this bottleneck and allows researchers and developers to use symbolic execution for larger and more complex programs. Similarly, our technique of summarising loops can speed up symbolic execution via the use of fast string solvers. The other major bottleneck is related to path explosion. Our chopped symbolic execution technique addresses this problem by allowing the analysis to focus on paths of interest, which could allow users to efficiently employ symbolic execution to more application domains. Finally, our technique for running symbolic execution for long amounts of time can be use to alleviate both memory pressure and path explosion.
Sectors Digital/Communication/Information Technologies (including Software)

URL https://srg.doc.ic.ac.uk/projects/
 
Description Our work on semantic-preserving transformations that take advantage of contextual information collected during symbolic execution to speed up array operations, as well as several aspects of our other prototypes have now been integrated into KLEE, a popular symbolic execution engine developed and maintained in our group, with a large userbase in both academia and industry. Our approach for summarizing loops has led to several refactorings that were contributed to popular open-source projects.
First Year Of Impact 2018
Sector Digital/Communication/Information Technologies (including Software)
Impact Types Economic

 
Description Collaboration with Purdue University 
Organisation Purdue University
Country United States 
Sector Academic/University 
PI Contribution This project led to a close collaboration with Purdue University, which resulted in a joint publication at ISSTA 2017.
Collaborator Contribution This project led to a close collaboration with David M. Perry and Xiangyu Zhang from Purdue University, which resulted in a joint publication at ISSTA 2017.
Impact Joint paper "Accelerating Array Constraints in Symbolic Execution" by David M. Perry, Andrea Mattavelli, Xiangyu Zhang, Cristian Cadar at the International Symposium on Software Testing and Analysis (ISSTA 2017), and an associated software implementation.
Start Year 2016
 
Description Collaboration with Tel Aviv University 
Organisation Tel Aviv University
Country Israel 
Sector Academic/University 
PI Contribution This project led to a close collaboration with Tel Aviv University, which resulted in a joint publication at ICSE 2018.
Collaborator Contribution This project led to a close collaboration with Tel Aviv University, which resulted in a joint publication at ICSE 2018.
Impact Joint paper "Chopped Symbolic Execution" by David Trabish, Andrea Mattavelli, Noam Rinetzky, Cristian Cadar at the International Conference on Software Engineering (ICSE 2018).
Start Year 2017
 
Title Artefact for the ISSTA 2020 Paper: Running Symbolic Execution Forever 
Description Abstract When symbolic execution is used to analyse real-world applications, it often consumes all available memory in a relatively short amount of time, sometimes making it impossible to analyse an application for an extended period. In this paper, we present a technique that can record an ongoing symbolic execution analysis to disk and selectively restore paths of interest later, making it possible to run symbolic execution indefinitely. To be successful, our approach addresses several essential research challenges related to detecting divergences on re-execution, storing long-running executions efficiently, changing search heuristics during re-execution, and providing a global view of the stored execution. Our extensive evaluation of 93 Linux applications shows that our approach is practical, enabling these applications to run for days while continuing to explore new execution paths. Artefact The artefact contains a Docker image with a compiled version of MoKlee, the benchmark programs as LLVM bitcode, all experiment results, and the necessary scripts to reproduce our study or apply MoKlee to different applications. Project page: https://srg.doc.ic.ac.uk/projects/moklee/ 
Type Of Technology Software 
Year Produced 2020 
Open Source License? Yes  
URL https://zenodo.org/record/3895270
 
Title Artefact for the ISSTA 2020 Paper: Running Symbolic Execution Forever 
Description Abstract When symbolic execution is used to analyse real-world applications, it often consumes all available memory in a relatively short amount of time, sometimes making it impossible to analyse an application for an extended period. In this paper, we present a technique that can record an ongoing symbolic execution analysis to disk and selectively restore paths of interest later, making it possible to run symbolic execution indefinitely. To be successful, our approach addresses several essential research challenges related to detecting divergences on re-execution, storing long-running executions efficiently, changing search heuristics during re-execution, and providing a global view of the stored execution. Our extensive evaluation of 93 Linux applications shows that our approach is practical, enabling these applications to run for days while continuing to explore new execution paths. Artefact The artefact contains a Docker image with a compiled version of MoKlee, the benchmark programs as LLVM bitcode, all experiment results, and the necessary scripts to reproduce our study or apply MoKlee to different applications. Project page: https://srg.doc.ic.ac.uk/projects/moklee/ 
Type Of Technology Software 
Year Produced 2020 
Open Source License? Yes  
URL https://zenodo.org/record/3895271
 
Title KLEE with Integer Extension 
Description An extension to KLEE that provides an integer solver based on Z3. Solver queries that do not contain bitvector instructions are forwarded to an integer solver instead of a bitvector solver. 
Type Of Technology Software 
Year Produced 2019 
Open Source License? Yes  
Impact The associated research was published in the International Conference on Tests and Proofs (TAP 2019). The software was used to explore the impact of using the theory of integers on the precision and performance of dynamic symbolic execution of C programs, a topic which was discussed by constraint solving and symbolic execution experts at the 2019 Dagstuhl seminar on "Bringing CP, SAT and SMT together: Next Challenges in Constraint Solving 
URL https://srg.doc.ic.ac.uk/projects/klee-z3-int-vs-bv/
 
Title KLEE-Arrays 
Description An extension to the KLEE symbolic execution engine that implements a set of semantic-preserving transformations that take advantage of contextual information collected during symbolic execution to speedup array operations. The extension is now part of the KLEE mainline. 
Type Of Technology Software 
Year Produced 2017 
Open Source License? Yes  
Impact The extension leads to significant speedups in the symbolic execution of popular benchmarks. 
URL https://srg.doc.ic.ac.uk/projects/klee-array/
 
Title Loop summarization tool 
Description Analysing and comprehending C programs that use strings is hard: Using standard library functions for manipulating strings is not enforced and programs often use complex loops for the same purpose. We introduce the notion of memoryless loops that capture some of these string loops. The tool uses a counterexample-guided inductive synthesis approach to summarise memoryless string loops using C standard library functions. 
Type Of Technology Software 
Year Produced 2019 
Open Source License? Yes  
Impact The associated research was published in the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2019), one of the top programming languages venues, and the artifact was validated as functional by the PLDI 2019 Artifact Evaluation Committee. The software was used to generate refactoring patches, which were successfully contributed to several popular open-source software systems. 
URL https://srg.doc.ic.ac.uk/projects/loop-summaries/
 
Title Segmented Memory Model for KLEE 
Description Symbolic execution is an effective technique for exploring paths in a program and reasoning about all possible values on those paths. However, the technique still struggles with code that uses complex heap data structures, in which a pointer is allowed to refer to more than one memory object. In such cases, symbolic execution typically forks execution into multiple states, one for each object to which the pointer could refer. This extension avoids this expensive forking by using a segmented memory model. In this model, memory is split into segments, so that each symbolic pointer refers to objects in a single segment. The size of the segments are bound by a threshold, in order to avoid expensive constraints. This results in a memory model where forking due to symbolic pointer dereferences is significantly reduced, often completely. 
Type Of Technology Software 
Year Produced 2019 
Open Source License? Yes  
Impact The associated research was published in the ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2019), one of the top software engineering venues, and the artifact was validated as reusable by the ESEC/FSE 2019 Artifact Evaluation Committee. 
URL https://srg.doc.ic.ac.uk/projects/klee-segmem/
 
Description Co-organiser of Dagstuhl seminar on Bringing CP, SAT and SMT together: Next Challenges in Constraint Solving 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact Co-organized a Dagstuhl seminar on the next challenges in constraint solving, which brought together academic and industry participants from the CP, SAT and SMT, and constraint solving applications communities.
Year(s) Of Engagement Activity 2019
URL https://www.dagstuhl.de/en/program/calendar/semhp/?semnr=19062
 
Description Co-organizer of Shonan Meeting on Fuzzing and Symbolic Execution: Reflections, Challenges, and Opportunities 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact Co-organized a Shonan meeting on fuzzing and symbolic execution, which brought together academic and industry participants from the fuzzing and symbolic execution communities, from across the globe.
Year(s) Of Engagement Activity 2019
URL https://shonan.nii.ac.jp/seminars/160/
 
Description INVEST workshop 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Undergraduate students
Results and Impact The Introduction to Verification and Testing (INVEST) workshop takes place in the Department of Computing, Imperial College London on an annual basis, jointly organised by Cadar, Donaldson and Gardner. The aim of the workshop is to introduce young researchers, and students potentially interested in research careers, to the fields of software verification and testing.
Year(s) Of Engagement Activity 2014,2015,2017
URL http://invest.doc.ic.ac.uk
 
Description Outreach to school groups 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Schools
Results and Impact Outreach talk to schools on software bugs are possible mitigations.
Year(s) Of Engagement Activity 2019
URL https://www.eventbrite.co.uk/e/software-bugs-why-they-exist-and-how-to-fight-them-tickets-6164483435...