Automatically Detecting and Surviving Exploitable Compiler Bugs

Lead Research Organisation: Imperial College London

Department Name: Computing

Abstract

The focus of this proposal is on the detection and survival of wrong code compiler defects, which we argue present a cyber-security threat that has been largely ignored to date. First, incorrectly compiled code can introduce exploitable vulnerabilities that are not visible at the source code level, and thus cannot be detected by source-level static analysers. Second, incorrectly compiled code can undermine the reliability of the application, which can have dramatic repercussions in the context of safety-critical systems. Third, wrong code compiler defects can also be the target of some of the most insidious security attacks. A crafty attacker posing as an open source developer can introduce a compiler-bug-based backdoor into a security-critical application by adding a patch that looks perfectly innocent but which, when compiled with a certain compiler, yields binary code that allows the attacker to compromise the software.

In this project, we aim to explore automated techniques that can detect and prevent such problems. In particular, we plan to investigate techniques for automatically finding compiler-induced vulnerabilities in real software, approaches for understanding the extent to which an attacker could maliciously modify an application to create a compiler-induced vulnerability, and methods for preventing against such vulnerabilities at runtime.

Planned Impact

The project has high potential for industrial impact, with associated economic and societal benefits, by improving the quality of developer tools and enabling more rapid development of reliable software on which businesses and end-users depend. Industrial beneficiaries include companies specialising in compiler development, such as Codeplay and PGI Compilers & Tools; suppliers of compiler test suites, including Solid Sands, NULLSTONE and Plum Hall; larger corporations that undertake significant compiler development work, such as AdaCore, AMD, Apple, ARM, Google, Imagination Technologies, Intel, Microsoft, NVIDIA, Oracle, Qualcomm, Xilinx; and companies that build high-assurance software or software analysis tools that in turn depend upon reliable compilers, such as AbsInt, Altran and TrustInSoft. Many of these companies, including Codeplay, ARM, Imagination Technologies, Google, Microsoft and Oracle, have strong bases in the UK. More broadly, our impact on the Clang and GCC infrastructures---which we propose to study in detail during the project---will have significant benefits for the large communities of associated users.

Our pathways to impact, detailed in a separate document, include (1) technology transfer to industry via direct engagement with industrial project partners Altran and Codeplay, (2) visiting other potential industrial beneficiaries such as ARM, Imagination Technologies, AdaCore, Google and Microsoft, (3) open sourcing our prototypes and applying them to large open-source code bases, (4) participating in industry-focused events, (5) writing developer-focused articles and recording research videos to reach a broader audience, (6) incorporating our case studies in our undergraduate teaching and Master's projects, (7) communicating our research to the public via outreach activities and (8) applying for funding to support impact activities from schemes designed to support impact activities.

Funded Value:

£672,082

Funded Period:

Jan 18 - Jul 21

Funder:

EPSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

EP/R011605/1

Principal Investigator:

Cristian Cadar

Research Subject:

Info. & commun. Technol. (100%)

Research Topic:

Fundamentals of Computing (50%)

Software Engineering (50%)

Organisations

People	ORCID iD
Cristian Cadar (Principal Investigator)
Alastair Donaldson (Co-Investigator)

Publications

Author Name

Title Publication Date Published

10 25 50

Windsor M (2021) C4: the C compiler concurrency checker

Nowack M (2019) Fine-Grain Memory Object Representation in Symbolic Execution

Marcozzi M (2019) Compiler fuzzing: how much does it matter? in Proceedings of the ACM on Programming Languages

Lascu A (2021) Dreaming up Metamorphic Relations: Experiences from Three Fuzzer Tools

Even-Mendoza K (2020) Closer to the edge

Even-Mendoza K (2022) CsmithEdge: more effective compiler testing by handling undefined behaviour less conservatively in Empirical Software Engineering

Even-Mendoza K (2023) Artifact of GrayC: Greybox Fuzzing of Compilers and Analysers for C

Even-Mendoza K (2023) GrayC: Greybox Fuzzing of Compilers and Analysers for C

Donaldson A (2021) Test-case reduction and deduplication almost for free with transformation-based compiler testing

Busse F (2020) Running symbolic execution forever

Boehme M (2021) Fuzzing: Challenges and Reflections in IEEE Software

Key Findings
Impact Summary
Software and Technical Products
Engagement Activities


Description	Recent years have seen significant research on automatic techniques for finding compiler defects. However, their practical impact has barely been assessed. We have conducted an empirical study examining the compilation of more than 11 million lines of C/C++ code from 318 Debian packages, using 45 historical bugs in the Clang/LLVM compiler, either found using four distinct fuzzers or the Alive formal verification tool. Results show that almost half of the fuzzer-found bugs propagate to the generated binaries for some packages, but never cause application test suite failures, suggesting that more research is needed to understand the impact of these bugs in practice. In the last decade, compiler fuzzing has found hundreds of bugs in widely-used compilers such as Clang and GCC. However, the idiomatic nature of the fuzzed programs limit their ability to find new bugs. We have developed novel techniques for generating less restricted programs that can still be used for fuzzing, which have led to the discovery of bugs which seem to be out of practical reach for existing techniques. In addition, this grant has partly enabled work on improving the scalability of symbolic execution, an influential technique for program analysis.
Exploitation Route	Our empirical study could be used by the developers of techniques for finding compiler defects to improve their techniques, and by practitioners to understand the impact of compiler bugs on their software products. Our compiler fuzzing techniques could be found to find critical bugs in compilers, which are out of practical reach of existing tools.
Sectors	Digital/Communication/Information Technologies (including Software)
URL	https://srg.doc.ic.ac.uk/projects/


Description	Ongoing research on better compiler fuzzing techniques has led to several previously-unknown miscompilation bugs found in the popular GCC, LLVM and MSVC compilers. We have also contributed several test cases to the LLVM compiler, which increase the test suite coverage.
First Year Of Impact	2020
Sector	Digital/Communication/Information Technologies (including Software)
Impact Types	Economic


Title	Artefact for assessing the impact of compiler bugs
Description	Artefact containing the experimental data and infrastructure for assessing the impact of compiler bugs
Type Of Technology	Software
Year Produced	2019
Open Source License?	Yes
Impact	The associated research was published in the Proceedings of the ACM on Programming Languages (OOPSLA 2019), one of the top venues in programming languages, and the artifact was evaluated as functional and reusable by the OOPSLA 2019 Artifact Evaluation Committee.
URL	https://srg.doc.ic.ac.uk/projects/compiler-bugs/


Title	Artefact for the ISSTA 2020 Paper: Running Symbolic Execution Forever
Description	Abstract When symbolic execution is used to analyse real-world applications, it often consumes all available memory in a relatively short amount of time, sometimes making it impossible to analyse an application for an extended period. In this paper, we present a technique that can record an ongoing symbolic execution analysis to disk and selectively restore paths of interest later, making it possible to run symbolic execution indefinitely. To be successful, our approach addresses several essential research challenges related to detecting divergences on re-execution, storing long-running executions efficiently, changing search heuristics during re-execution, and providing a global view of the stored execution. Our extensive evaluation of 93 Linux applications shows that our approach is practical, enabling these applications to run for days while continuing to explore new execution paths. Artefact The artefact contains a Docker image with a compiled version of MoKlee, the benchmark programs as LLVM bitcode, all experiment results, and the necessary scripts to reproduce our study or apply MoKlee to different applications. Project page: https://srg.doc.ic.ac.uk/projects/moklee/
Type Of Technology	Software
Year Produced	2020
Open Source License?	Yes
URL	https://zenodo.org/record/3895271


Title	Artefact for the ISSTA 2020 Paper: Running Symbolic Execution Forever
Description	Abstract When symbolic execution is used to analyse real-world applications, it often consumes all available memory in a relatively short amount of time, sometimes making it impossible to analyse an application for an extended period. In this paper, we present a technique that can record an ongoing symbolic execution analysis to disk and selectively restore paths of interest later, making it possible to run symbolic execution indefinitely. To be successful, our approach addresses several essential research challenges related to detecting divergences on re-execution, storing long-running executions efficiently, changing search heuristics during re-execution, and providing a global view of the stored execution. Our extensive evaluation of 93 Linux applications shows that our approach is practical, enabling these applications to run for days while continuing to explore new execution paths. Artefact The artefact contains a Docker image with a compiled version of MoKlee, the benchmark programs as LLVM bitcode, all experiment results, and the necessary scripts to reproduce our study or apply MoKlee to different applications. Project page: https://srg.doc.ic.ac.uk/projects/moklee/
Type Of Technology	Software
Year Produced	2020
Open Source License?	Yes
URL	https://zenodo.org/record/3895270


Description	Co-organizer of Shonan Meeting on Fuzzing and Symbolic Execution: Reflections, Challenges, and Opportunities
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Other audiences
Results and Impact	Co-organized a Shonan meeting on fuzzing and symbolic execution, which brought together academic and industry participants from the fuzzing and symbolic execution communities, from across the globe.
Year(s) Of Engagement Activity	2019
URL	https://shonan.nii.ac.jp/seminars/160/


Description	Outreach to school groups
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Local
Primary Audience	Schools
Results and Impact	Outreach talk to schools on software bugs are possible mitigations.
Year(s) Of Engagement Activity	2019
URL	https://www.eventbrite.co.uk/e/software-bugs-why-they-exist-and-how-to-fight-them-tickets-6164483435...


Description	Talk at IFIP Working Group 2.4
Form Of Engagement Activity	A formal working group, expert panel or dialogue
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Talk at the IFIP Working Group 2.4 on Software Implementation Technology, attended by a mix of academics and practitioners/industry
Year(s) Of Engagement Activity	2021

Abstract

Planned Impact

Organisations

People

ORCID iD

Publications