CapC: Capability C semantics, tools and reasoning

Lead Research Organisation: University of Kent

Department Name: Sch of Computing

Abstract

We address a difficult technical problem that is drawn from industry:
we seek a solution to fundamental problems found in the standards of
the C and C++ programming languages. C and C++ code is not just
prevalent -- it is used to form the lowest and most trusted levels of
our systems. The kernel of every mainstream operating system uses some
combination of the two, including Windows, MacOS, iOS, Android, Linux
and Unix, as do the swathe of embedded controllers with essential
functions like automotive engine management. Having a good
specification of the language is the first step in verifying the
correctness of these vital system components.

-- Combatting software failure --

This work is part of a larger effort to combat software failure by
developing techniques to verify the correctness of software.
Currently, developers of computer systems rely predominantly on
testing to ensure that systems behave as they should. The system is
run for some time over various inputs and monitored for failure. The
hope is that this will expose enough of the flaws in the system to
make it reliable once it is deployed. But it is increasingly
expensive to achieve good coverage: systems like cars experience
wildly varied inputs, and a fleet of a particular model of car runs
collectively for far longer than the time its computer systems are
tested. Worse still, modern systems are concurrent -- using multiple
communicating processors to complete a task. The delicate interplay
between the concurrent processors makes the output of the system
dependent on the timing of communication, so that some behaviours
occur only a handful of times in billions of runs, leaving testing
little hope of finding associated bugs. When scrutinising security
properties, one is faced not with simple circumstance, but with a
committed adversary that cannot be replicated by simple testing.

There is evidence that validating software through testing is breaking
down and some bugs are evading discovery even in critical systems: for
example a concurrency bug caused some of Toyota's cars to suddenly and
relentlessly accelerate, killing 83 over 10 years. The wider economic
cost of software failure was estimated by the U.S. National Institute
of Standards and Technology to cost USD 60bn each year. Improving our
approach to software failure would have substantial economic and
societal impact.

Verification offers an alternative to testing: one defines desirable
properties of the system -- it will not crash, fuel metering will be
proportional to accelerator input, and so on -- and mathematically
proves that the code satisfies them. In the ideal of verification,
there is no space for bugs to creep in and the mathematical proof of
correctness is absolute. This is particularly valuable for security
properties. Unfortunately, verification techniques are invariably
built above an idealised model of the computer system, e.g.\ the
assumption that memory accesses take place in a global sequential
order, so called sequential consistency (SC). The distance between the
ideal and the reality leaves ample space for bugs to persist. In fact
the status quo is much worse because we do not have a characterisation
of the reality of the system's behaviour: our best models of
programming-language behaviour are known to be broken, e.g.\ in C, C++
and Java.

In this broad context, our project will develop a description the C
language that matches the reality, permitting the sorts of behaviour
exhibited by compiler optimisations and the underlying concurrent
processors. At the same time, we will develop verification techniques
in a setting that correctly models the subtle behaviour of modern
languages, dovetailing these previously disparate views of the
system. Our work will make verification of concurrent systems more
viable, including security properties, helping to address the economic
and social costs of software failure.

Planned Impact

The C language, that lies at the heart of our computer systems, is not
understood. This lack of a sound foundation makes it impossible to be
sure of the security and robustness of our systems. This project will
produce a mathematical description of the currently ill-defined C
language, specialised to the Capability Hardware being developed as
part of the Secure by Design programme. We strive for industrial
adoption of our approach, targeting the ISO and others with tailored
impact-driven dissemination.

The problems in C that we seek to solve are fundamental, and
analogous problems are found in the specifications of the Linux
concurrency macros, Java, OpenCL, CUDA, LLVM IR and every other
programming language that provides efficient low-level access to
memory and an optimising compiler.

This project has the potential for broad impact in both academia and
industry. We will construct a new scheme for specifying C that solves
the problems in the state-of-the-art. This new approach will produce
3 kinds of impact: it will fix problems in current industrial
specifications, it will establish precisely how compiler of the C
language interprets intricate corner cases, and it will enable formal
verification of C components using capabilities, concurrency and
pointers. We expand on each below.

1. A concrete basis for language concurrency.

Throughout the development of our models we will endeavour to match
the style of the existing C specification and minimise differences. We
intend our model to serve as the cleanest and most straightforward
replacement, maximising the potential for impact on future revisions
of the C specification. Our new semantics will be mathematically
defined, with an automatic evaluator that experts can use to run the semantics, and
test their intuitions. In this way, we will produce a superior
alternative to the current C specification, and it is our
intent that this should replace the status quo.

The prior experience and contacts of the PI, Batty, and Co-I Kell
place the team in a unique position to press for impact across many
languages: they have key contacts concerned with the definitions of the
Linux concurrency macros, Java and CUDA. Moreover, they have been
particularly involved in the definition of the C and C++ languages,
where developments are keenly watched by other language specifiers,
and often adopted more or less verbatim (as in OpenCL and LLVM IR).

This context suggests the following strategy for industrial impact:
Batty and Kell will present the semantics defined by this project to
the specifiers of C and C++ and disseminate it to key players in the
definition of the Linux concurrency macros, Java, and CUDA.

2. Optimisation validation.

Key processor vendors have engaged with recent research on relaxed
memory and sought to limit the introduction of new behaviours, while
releasing increasingly detailed and accurate micro-architectural
specifications. Compiler writers have not yet engaged with the same
keen enthusiasm, but the development of a pointer-aware model of
language concurrency will enable new academic/industrial
collaboration.

3. Supporting C verification.

A central goal of the project is to provide a sound basis for C
verification, and to demonstrate that C components of the Capability
Hardware software stack can be verified, particularly parts of the
Microsoft Verona runtime. This has the potential for major impact: it
would pave the way for verification of industrially-relevant bodies of
code that use C, like the Linux kernel.

Funded Value:

£485,167

Funded Period:

Aug 20 - Jul 24

Funder:

ISCF

Project Status:

Active

Project Category:

Research Grant

Project Reference:

EP/V000470/1

Principal Investigator:

Mark Batty

Research Subject:

Info. & commun. Technol. (100%)

Research Topic:

Fundamentals of Computing (85%)

Software Engineering (15%)

Organisations

University of Kent (Lead Research Organisation)

People	ORCID iD
Mark Batty (Principal Investigator)
Stephen Kell (Co-Investigator)

Publications

Author Name

Title Publication Date Published

10 25 50

Dawson S (2022) Chronos vs. Chaos

Gopalakrishnan A (2023) Memory Consistency Models for Program Transformations: An Intellectual Abstract

Jeffrey A (2022) The leaky semicolon: compositional semantic dependencies for relaxed-memory concurrency in Proceedings of the ACM on Programming Languages

Lustig D (2022) Mixed-proxy extensions for the NVIDIA PTX memory consistency model

Paviotti M (2020) Programming Languages and Systems - 29th European Symposium on Programming, ESOP 2020, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2020, Dublin, Ireland, April 25-30, 2020, Proceedings

Paviotti, M. (2020) Modular Relaxed Dependencies in Weak Memory Concurrency

Wright D (2021) Owicki-Gries Reasoning for C11 Programs with Relaxed Dependencies (Extended Version)

Wright D (2023) Mechanised Operational Reasoning for C11 Programs with Relaxed Dependencies in Formal Aspects of Computing

Key Findings
Impact Summary
Software and Technical Products


Description	This award tackles a major problem in the definition of optimised programming languages, where language definitions do not correctly describe which optimisations are allowed over concurrent memory accesses and which are not. At the same time, prototype ARM hardware provides safety guarantees governing memory behaviour that will vastly improve on the security of existing systems. We need programming language models that reconcile optimisations, concurrency, and that preserve safety guarantees. This project set out to do this in two strands: one fixing the concurrency models of popular systems languages, and another specifying the semantics of safe concurrent languages. On the first strand, we have produced a solution to the longstanding thin-air problem, and we have secured two votes at the International Standards Organisation in support of proposed changes to the C++ standard. We are working through implementing those changes with the ISO now. This is in progress but is nearing completion. On the second strand, the systems language Rust provides similar safety guarantees to the prototype ARM architecture. Our approach to providing guarantees at the language level aims to integrate with the existing guarantees provided by the runtime of a language like Rust. To start this work, we began porting Rust to the prototype ARM architecture. There is not sufficiently developed canonical Rust specification yet though, so this strand of work is larger than anticipated. There is a move to adopt Rust into mainstream systems programming, with Rust now forming a core language of the Linux kernel. We are well placed to help in nascent efforts to specify the behaviour of this language, and we have made excellent progress providing a memory-safe language for the prototype ARM hardware.
Exploitation Route	We have many requests from other academic groups and companies to use the Rust compiler. We are rushing towards a release now. We expect this to happen in the second quarter of 2023. Our changes to the C++ standard will open up the possibility of verification tools that check standards-compliant code for correctness in the concurrent setting. We have started building reasoning techniques along these lines, and we have further funding to continue in this direction.
Sectors	Aerospace, Defence and Marine,Digital/Communication/Information Technologies (including Software),Security and Diplomacy


Description	We have ported BusyBox to the Morello architecture. We have ported the Rust compiler to Morello. We have a solution to the thin-air problem in C++. We are seeking adoption of this solution by the International Standards Organisation.
Sector	Aerospace, Defence and Marine,Digital/Communication/Information Technologies (including Software),Security and Diplomacy


Title	Isabelle/HOL files for "Mechanised Operational Reasoning for C11 Programs with Relaxed Dependencies"
Description	Isabelle/HOL files for the paper "Mechanised Operational Reasoning for C11 Programs with Relaxed Dependencies". Tested for Isabelle 2022.
Type Of Technology	Software
Year Produced	2022
Open Source License?	Yes
URL	https://figshare.com/articles/software/Isabelle_HOL_files_for_Mechanised_Operational_Reasoning_for_C...

Abstract

Planned Impact

Organisations

People

ORCID iD

Publications