Automatically Detecting and Surviving Exploitable Compiler Bugs
Lead Research Organisation:
Imperial College London
Department Name: Computing
Abstract
The focus of this proposal is on the detection and survival of wrong code compiler defects, which we argue present a cyber-security threat that has been largely ignored to date. First, incorrectly compiled code can introduce exploitable vulnerabilities that are not visible at the source code level, and thus cannot be detected by source-level static analysers. Second, incorrectly compiled code can undermine the reliability of the application, which can have dramatic repercussions in the context of safety-critical systems. Third, wrong code compiler defects can also be the target of some of the most insidious security attacks. A crafty attacker posing as an open source developer can introduce a compiler-bug-based backdoor into a security-critical application by adding a patch that looks perfectly innocent but which, when compiled with a certain compiler, yields binary code that allows the attacker to compromise the software.
In this project, we aim to explore automated techniques that can detect and prevent such problems. In particular, we plan to investigate techniques for automatically finding compiler-induced vulnerabilities in real software, approaches for understanding the extent to which an attacker could maliciously modify an application to create a compiler-induced vulnerability, and methods for preventing against such vulnerabilities at runtime.
In this project, we aim to explore automated techniques that can detect and prevent such problems. In particular, we plan to investigate techniques for automatically finding compiler-induced vulnerabilities in real software, approaches for understanding the extent to which an attacker could maliciously modify an application to create a compiler-induced vulnerability, and methods for preventing against such vulnerabilities at runtime.
Planned Impact
The project has high potential for industrial impact, with associated economic and societal benefits, by improving the quality of developer tools and enabling more rapid development of reliable software on which businesses and end-users depend. Industrial beneficiaries include companies specialising in compiler development, such as Codeplay and PGI Compilers & Tools; suppliers of compiler test suites, including Solid Sands, NULLSTONE and Plum Hall; larger corporations that undertake significant compiler development work, such as AdaCore, AMD, Apple, ARM, Google, Imagination Technologies, Intel, Microsoft, NVIDIA, Oracle, Qualcomm, Xilinx; and companies that build high-assurance software or software analysis tools that in turn depend upon reliable compilers, such as AbsInt, Altran and TrustInSoft. Many of these companies, including Codeplay, ARM, Imagination Technologies, Google, Microsoft and Oracle, have strong bases in the UK. More broadly, our impact on the Clang and GCC infrastructures---which we propose to study in detail during the project---will have significant benefits for the large communities of associated users.
Our pathways to impact, detailed in a separate document, include (1) technology transfer to industry via direct engagement with industrial project partners Altran and Codeplay, (2) visiting other potential industrial beneficiaries such as ARM, Imagination Technologies, AdaCore, Google and Microsoft, (3) open sourcing our prototypes and applying them to large open-source code bases, (4) participating in industry-focused events, (5) writing developer-focused articles and recording research videos to reach a broader audience, (6) incorporating our case studies in our undergraduate teaching and Master's projects, (7) communicating our research to the public via outreach activities and (8) applying for funding to support impact activities from schemes designed to support impact activities.
Our pathways to impact, detailed in a separate document, include (1) technology transfer to industry via direct engagement with industrial project partners Altran and Codeplay, (2) visiting other potential industrial beneficiaries such as ARM, Imagination Technologies, AdaCore, Google and Microsoft, (3) open sourcing our prototypes and applying them to large open-source code bases, (4) participating in industry-focused events, (5) writing developer-focused articles and recording research videos to reach a broader audience, (6) incorporating our case studies in our undergraduate teaching and Master's projects, (7) communicating our research to the public via outreach activities and (8) applying for funding to support impact activities from schemes designed to support impact activities.
Publications
Windsor M
(2021)
C4: the C compiler concurrency checker
Nowack M
(2019)
Fine-Grain Memory Object Representation in Symbolic Execution
Marcozzi M
(2019)
Compiler fuzzing: how much does it matter?
in Proceedings of the ACM on Programming Languages
Even-Mendoza K
(2020)
Closer to the edge
Even-Mendoza K
(2022)
CsmithEdge: more effective compiler testing by handling undefined behaviour less conservatively
in Empirical Software Engineering
Even-Mendoza K
(2023)
Artifact of GrayC: Greybox Fuzzing of Compilers and Analysers for C
Even-Mendoza K
(2023)
GrayC: Greybox Fuzzing of Compilers and Analysers for C
Busse F
(2020)
Running symbolic execution forever
Boehme M
(2021)
Fuzzing: Challenges and Reflections
in IEEE Software
Description | Recent years have seen significant research on automatic techniques for finding compiler defects. However, their practical impact has barely been assessed. We have conducted an empirical study examining the compilation of more than 11 million lines of C/C++ code from 318 Debian packages, using 45 historical bugs in the Clang/LLVM compiler, either found using four distinct fuzzers or the Alive formal verification tool. Results show that almost half of the fuzzer-found bugs propagate to the generated binaries for some packages, but never cause application test suite failures, suggesting that more research is needed to understand the impact of these bugs in practice. In the last decade, compiler fuzzing has found hundreds of bugs in widely-used compilers such as Clang and GCC. However, the idiomatic nature of the fuzzed programs limit their ability to find new bugs. We have developed novel techniques for generating less restricted programs that can still be used for fuzzing, which have led to the discovery of bugs which seem to be out of practical reach for existing techniques. In addition, this grant has partly enabled work on improving the scalability of symbolic execution, an influential technique for program analysis. |
Exploitation Route | Our empirical study could be used by the developers of techniques for finding compiler defects to improve their techniques, and by practitioners to understand the impact of compiler bugs on their software products. Our compiler fuzzing techniques could be found to find critical bugs in compilers, which are out of practical reach of existing tools. |
Sectors | Digital/Communication/Information Technologies (including Software) |
URL | https://srg.doc.ic.ac.uk/projects/ |
Description | Ongoing research on better compiler fuzzing techniques has led to several previously-unknown miscompilation bugs found in the popular GCC, LLVM and MSVC compilers. We have also contributed several test cases to the LLVM compiler, which increase the test suite coverage. |
First Year Of Impact | 2020 |
Sector | Digital/Communication/Information Technologies (including Software) |
Impact Types | Economic |
Title | Artefact for assessing the impact of compiler bugs |
Description | Artefact containing the experimental data and infrastructure for assessing the impact of compiler bugs |
Type Of Technology | Software |
Year Produced | 2019 |
Open Source License? | Yes |
Impact | The associated research was published in the Proceedings of the ACM on Programming Languages (OOPSLA 2019), one of the top venues in programming languages, and the artifact was evaluated as functional and reusable by the OOPSLA 2019 Artifact Evaluation Committee. |
URL | https://srg.doc.ic.ac.uk/projects/compiler-bugs/ |
Title | Artefact for the ISSTA 2020 Paper: Running Symbolic Execution Forever |
Description | Abstract When symbolic execution is used to analyse real-world applications, it often consumes all available memory in a relatively short amount of time, sometimes making it impossible to analyse an application for an extended period. In this paper, we present a technique that can record an ongoing symbolic execution analysis to disk and selectively restore paths of interest later, making it possible to run symbolic execution indefinitely. To be successful, our approach addresses several essential research challenges related to detecting divergences on re-execution, storing long-running executions efficiently, changing search heuristics during re-execution, and providing a global view of the stored execution. Our extensive evaluation of 93 Linux applications shows that our approach is practical, enabling these applications to run for days while continuing to explore new execution paths. Artefact The artefact contains a Docker image with a compiled version of MoKlee, the benchmark programs as LLVM bitcode, all experiment results, and the necessary scripts to reproduce our study or apply MoKlee to different applications. Project page: https://srg.doc.ic.ac.uk/projects/moklee/ |
Type Of Technology | Software |
Year Produced | 2020 |
Open Source License? | Yes |
URL | https://zenodo.org/record/3895271 |
Title | Artefact for the ISSTA 2020 Paper: Running Symbolic Execution Forever |
Description | Abstract When symbolic execution is used to analyse real-world applications, it often consumes all available memory in a relatively short amount of time, sometimes making it impossible to analyse an application for an extended period. In this paper, we present a technique that can record an ongoing symbolic execution analysis to disk and selectively restore paths of interest later, making it possible to run symbolic execution indefinitely. To be successful, our approach addresses several essential research challenges related to detecting divergences on re-execution, storing long-running executions efficiently, changing search heuristics during re-execution, and providing a global view of the stored execution. Our extensive evaluation of 93 Linux applications shows that our approach is practical, enabling these applications to run for days while continuing to explore new execution paths. Artefact The artefact contains a Docker image with a compiled version of MoKlee, the benchmark programs as LLVM bitcode, all experiment results, and the necessary scripts to reproduce our study or apply MoKlee to different applications. Project page: https://srg.doc.ic.ac.uk/projects/moklee/ |
Type Of Technology | Software |
Year Produced | 2020 |
Open Source License? | Yes |
URL | https://zenodo.org/record/3895270 |
Description | Co-organizer of Shonan Meeting on Fuzzing and Symbolic Execution: Reflections, Challenges, and Opportunities |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Other audiences |
Results and Impact | Co-organized a Shonan meeting on fuzzing and symbolic execution, which brought together academic and industry participants from the fuzzing and symbolic execution communities, from across the globe. |
Year(s) Of Engagement Activity | 2019 |
URL | https://shonan.nii.ac.jp/seminars/160/ |
Description | Outreach to school groups |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | Local |
Primary Audience | Schools |
Results and Impact | Outreach talk to schools on software bugs are possible mitigations. |
Year(s) Of Engagement Activity | 2019 |
URL | https://www.eventbrite.co.uk/e/software-bugs-why-they-exist-and-how-to-fight-them-tickets-6164483435... |
Description | Talk at IFIP Working Group 2.4 |
Form Of Engagement Activity | A formal working group, expert panel or dialogue |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Talk at the IFIP Working Group 2.4 on Software Implementation Technology, attended by a mix of academics and practitioners/industry |
Year(s) Of Engagement Activity | 2021 |