Automatically Detecting and Surviving Exploitable Compiler Bugs

Lead Research Organisation: Imperial College London
Department Name: Computing

Abstract

The focus of this proposal is on the detection and survival of wrong code compiler defects, which we argue present a cyber-security threat that has been largely ignored to date. First, incorrectly compiled code can introduce exploitable vulnerabilities that are not visible at the source code level, and thus cannot be detected by source-level static analysers. Second, incorrectly compiled code can undermine the reliability of the application, which can have dramatic repercussions in the context of safety-critical systems. Third, wrong code compiler defects can also be the target of some of the most insidious security attacks. A crafty attacker posing as an open source developer can introduce a compiler-bug-based backdoor into a security-critical application by adding a patch that looks perfectly innocent but which, when compiled with a certain compiler, yields binary code that allows the attacker to compromise the software.

In this project, we aim to explore automated techniques that can detect and prevent such problems. In particular, we plan to investigate techniques for automatically finding compiler-induced vulnerabilities in real software, approaches for understanding the extent to which an attacker could maliciously modify an application to create a compiler-induced vulnerability, and methods for preventing against such vulnerabilities at runtime.

Planned Impact

The project has high potential for industrial impact, with associated economic and societal benefits, by improving the quality of developer tools and enabling more rapid development of reliable software on which businesses and end-users depend. Industrial beneficiaries include companies specialising in compiler development, such as Codeplay and PGI Compilers & Tools; suppliers of compiler test suites, including Solid Sands, NULLSTONE and Plum Hall; larger corporations that undertake significant compiler development work, such as AdaCore, AMD, Apple, ARM, Google, Imagination Technologies, Intel, Microsoft, NVIDIA, Oracle, Qualcomm, Xilinx; and companies that build high-assurance software or software analysis tools that in turn depend upon reliable compilers, such as AbsInt, Altran and TrustInSoft. Many of these companies, including Codeplay, ARM, Imagination Technologies, Google, Microsoft and Oracle, have strong bases in the UK. More broadly, our impact on the Clang and GCC infrastructures---which we propose to study in detail during the project---will have significant benefits for the large communities of associated users.

Our pathways to impact, detailed in a separate document, include (1) technology transfer to industry via direct engagement with industrial project partners Altran and Codeplay, (2) visiting other potential industrial beneficiaries such as ARM, Imagination Technologies, AdaCore, Google and Microsoft, (3) open sourcing our prototypes and applying them to large open-source code bases, (4) participating in industry-focused events, (5) writing developer-focused articles and recording research videos to reach a broader audience, (6) incorporating our case studies in our undergraduate teaching and Master's projects, (7) communicating our research to the public via outreach activities and (8) applying for funding to support impact activities from schemes designed to support impact activities.

Publications

10 25 50
 
Description Recent years have seen significant research on automatic techniques for finding compiler defects. However, their practical impact has barely been assessed. We have conducted an empirical study examining the compilation of more than 11 million lines of C/C++ code from 318 Debian packages, using 45 historical bugs in the Clang/LLVM compiler, either found using four distinct fuzzers or the Alive formal verification tool. Results show that almost half of the fuzzer-found bugs propagate to the generated binaries for some packages, but never cause application test suite failures, suggesting that more research is needed to understand the impact of these bugs in practice.

In the last decade, compiler fuzzing has found hundreds of bugs in widely-used compilers such as Clang and GCC. However, the idiomatic nature of the fuzzed programs limit their ability to find new bugs. We have developed novel techniques for generating less restricted programs that can still be used for fuzzing, which have led to the discovery of bugs which seem to be out of practical reach for existing techniques.

In addition, this grant has partly enabled work on improving the scalability of symbolic execution, an influential technique for program analysis.
Exploitation Route Our empirical study could be used by the developers of techniques for finding compiler defects to improve their techniques, and by practitioners to understand the impact of compiler bugs on their software products.

Our compiler fuzzing techniques could be found to find critical bugs in compilers, which are out of practical reach of existing tools.
Sectors Digital/Communication/Information Technologies (including Software)

URL https://srg.doc.ic.ac.uk/projects/
 
Description Ongoing research on better compiler fuzzing techniques has led to several previously-unknown miscompilation bugs found in the popular GCC, LLVM and MSVC compilers. We have also contributed several test cases to the LLVM compiler, which increase the test suite coverage.
First Year Of Impact 2020
Sector Digital/Communication/Information Technologies (including Software)
Impact Types Economic

 
Title Artefact for assessing the impact of compiler bugs 
Description Artefact containing the experimental data and infrastructure for assessing the impact of compiler bugs 
Type Of Technology Software 
Year Produced 2019 
Open Source License? Yes  
Impact The associated research was published in the Proceedings of the ACM on Programming Languages (OOPSLA 2019), one of the top venues in programming languages, and the artifact was evaluated as functional and reusable by the OOPSLA 2019 Artifact Evaluation Committee. 
URL https://srg.doc.ic.ac.uk/projects/compiler-bugs/
 
Title Artefact for the ISSTA 2020 Paper: Running Symbolic Execution Forever 
Description Abstract When symbolic execution is used to analyse real-world applications, it often consumes all available memory in a relatively short amount of time, sometimes making it impossible to analyse an application for an extended period. In this paper, we present a technique that can record an ongoing symbolic execution analysis to disk and selectively restore paths of interest later, making it possible to run symbolic execution indefinitely. To be successful, our approach addresses several essential research challenges related to detecting divergences on re-execution, storing long-running executions efficiently, changing search heuristics during re-execution, and providing a global view of the stored execution. Our extensive evaluation of 93 Linux applications shows that our approach is practical, enabling these applications to run for days while continuing to explore new execution paths. Artefact The artefact contains a Docker image with a compiled version of MoKlee, the benchmark programs as LLVM bitcode, all experiment results, and the necessary scripts to reproduce our study or apply MoKlee to different applications. Project page: https://srg.doc.ic.ac.uk/projects/moklee/ 
Type Of Technology Software 
Year Produced 2020 
Open Source License? Yes  
URL https://zenodo.org/record/3895271
 
Title Artefact for the ISSTA 2020 Paper: Running Symbolic Execution Forever 
Description Abstract When symbolic execution is used to analyse real-world applications, it often consumes all available memory in a relatively short amount of time, sometimes making it impossible to analyse an application for an extended period. In this paper, we present a technique that can record an ongoing symbolic execution analysis to disk and selectively restore paths of interest later, making it possible to run symbolic execution indefinitely. To be successful, our approach addresses several essential research challenges related to detecting divergences on re-execution, storing long-running executions efficiently, changing search heuristics during re-execution, and providing a global view of the stored execution. Our extensive evaluation of 93 Linux applications shows that our approach is practical, enabling these applications to run for days while continuing to explore new execution paths. Artefact The artefact contains a Docker image with a compiled version of MoKlee, the benchmark programs as LLVM bitcode, all experiment results, and the necessary scripts to reproduce our study or apply MoKlee to different applications. Project page: https://srg.doc.ic.ac.uk/projects/moklee/ 
Type Of Technology Software 
Year Produced 2020 
Open Source License? Yes  
URL https://zenodo.org/record/3895270
 
Description Co-organizer of Shonan Meeting on Fuzzing and Symbolic Execution: Reflections, Challenges, and Opportunities 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact Co-organized a Shonan meeting on fuzzing and symbolic execution, which brought together academic and industry participants from the fuzzing and symbolic execution communities, from across the globe.
Year(s) Of Engagement Activity 2019
URL https://shonan.nii.ac.jp/seminars/160/
 
Description Outreach to school groups 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Schools
Results and Impact Outreach talk to schools on software bugs are possible mitigations.
Year(s) Of Engagement Activity 2019
URL https://www.eventbrite.co.uk/e/software-bugs-why-they-exist-and-how-to-fight-them-tickets-6164483435...
 
Description Talk at IFIP Working Group 2.4 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Talk at the IFIP Working Group 2.4 on Software Implementation Technology, attended by a mix of academics and practitioners/industry
Year(s) Of Engagement Activity 2021