REMS: Rigorous Engineering for Mainstream Systems

Lead Research Organisation: University of Cambridge
Department Name: Computer Laboratory

Abstract

Computer systems are critical to modern society, from embedded devices and mobile phones through to internet services, but traditional computer engineering cannot produce reliable and secure systems, as we see in the continual stream of errors and security flaws impacting industry and consumers everywhere. Industry relies on informal prose specifications and ad hoc testing, but the complexity and scale of today's systems make it effectively impossible to exclude errors using only these methods (even though perhaps 70\% of development staff in major vendors are devoted to testing) and adversarial exploits all too often turn errors into major security concerns, with huge financial and social costs. Compounding the problem, the industry-wide abstractions around which real systems are built, including the architectural interfaces between multiprocessor hardware and software, and programming languages such as C and C++, are ill-defined, giving rise to ambiguity and error.

This is a long-standing problem, motivating extensive research on mathematically rigorous alternatives. Until now, it has been impractical to address mainstream widely-deployed systems with rigorous techniques. Academic research has focussed instead on modest-scale safety critical systems and academic verifications of idealised systems. There has been a deep disconnect between practical industrial engineering and systems research on the one hand, and theoretical research in semantics, verification, and foundations on the other. Now, however, the new approaches that we have, with a pragmatic emphasis on the use of mathematically rigorous models where they can be most effective, finally make it feasible to address industrial-scale systems--- and the commercial and social cost of not doing so is becoming ever greater. It is time for rigorous engineering of mainstream systems, unifying practice and theory.

REMS brings together a team of leading theory and systems researchers, with key industrial partners, to establish a new synthesis. We will establish a true engineering mathematics for the construction of more robust and secure computer systems: mathematically rigorous models, verification techniques, and engineering tools applied to the construction of full-scale mainstream computer systems, including key infrastructure in widespread use today (multiprocessors, programming languages, and operating systems). We will couple this with novel programming-language and verification support for future system-building. Developing our recent work with processor vendors and language standards bodies, we will address the technical, engineering and social challenges necessary to bring these techniques into widespread use, enabling a radical improvement in computer engineering as a whole.

Planned Impact

Computer systems are pervasive, from a host of embedded devices (in everything from heating controllers and power distribution networks to cars and mobile phones), through the Internet giving us email and the web, up to the supercomputers used for weather forecasting and scientific research. We rely on them in almost everything we do, but they are fundamentally unreliable: they are developed by testing-based methods that cannot handle the complexity of modern systems. The result is a high direct cost, to build adequate systems by these inadequate methods, and a much higher indirect cost, on all of society, to handle the inevitable bugs and inexplicable behaviours that occur and to deal with the security problems that arise when the remaining errors are exploited.


The ambitious aim of the REMS project is to enable and provoke a profound change in how computer systems are engineered, placing *mathematically rigorous* engineering methods at the heart of the development of mainstream systems. This will be achieved by building on precise mathematical models of some of the key abstractions around which all computer systems are constructed: multiprocessors and the C programming language. It will not remove all problems, but it will provide a significant step towards making the products of computer engineering as reliable as the other engineering infrastructure on which society depends.

Towards this broad societal impact are many specific opportunities for economic impact, including:

- On the architectural interface between hardware and software, our processor-vendor partners ARM and IBM will benefit directly from improved specifications, the testing tools that they enable, and discussions of future architectural choices; all users of this interface (compiler and operating system vendors) will benefit from the increased clarity.

- On the definition of the C language, system developers will benefit directly from more accurate ISO standards and improved core infrastructure and, more concretely, from our static and dynamic analysis tools.

- Several start-up possibilities that may arise from particular subtasks, e.g. from some of our tool developments, which we will pursue as appropriate, building on our XenSource experience.



The most important economic impact will be via the eventual effects of deploying rigorous methods across the computing industry, improving quality and reducing time- and cost-to-market for its products. By improving the quality of these products, we will reduce the hidden cost of unreliability and insecurity in computer systems, which affect us all.

Publications

10 25 50
 
Description Please see the project web pages http://www.cl.cam.ac.uk/users/pes20/rems
and the summary document: http://www.cl.cam.ac.uk/users/pes20/rems/report-2019-web.pdf


Introduction: the REMS project

The mainstream computing infrastructure on which we all rely works well enough to support much of modern society, but it is neither robust nor secure, as demonstrated over the years by untold numbers of system failures and successful malicious attacks. Fundamentally, this is because the engineering methods by which computer systems are built are not up to the task. Normal industry practice is based on a test-and-debug engineering process, with triumvirates of code, manually curated tests, and prose specification documents; it relies also on the division of labour that is enabled by common interfaces, allowing systems to be composed of subsystems built by different groups. But prose specification documents do not provide an oracle that one can test a system against, leaving code, tests, and specifications only weakly related; they are inherently ambigous, lacking mathematical precision; and the existing specifications of many key interfaces leave much to be desired. Moreover, some of the legacy interfaces that we still rely on, such as the C programming language, were simply not designed to protect against the accidental errors and malicious attacks that are now so common.

REMS (Rigorous Engineering for Mainstream Systems), is an EPSRC-funded Programme Grant (2013-2020, £5.6M, Cambridge, Imperial, and Edinburgh) to explore how we can use rigorous mathematics to improve engineering practice for mainstream computer systems, to make them more robust and secure: broadly, to "create the intellectual and software tools to apply semantics and verification at the heart of mainstream system engineering". Addressing this problem needs close interaction between the normally disjoint cultures of systems and semantics research, and REMS brings together an unusual combination of researchers in both to achieve this: in architecture, operating systems, security, and networks, and in programming languages, automated reasoning, and verification. Building on many years of research worldwide on mathematical semantics and verification, we are finally in a position to scale that up, to apply mathematically rigorous semantics to mainstream systems. REMS aims:

to develop rigorous semantics for key mainstream abstractions; and
to improve those, and to develop new systems research, using rigorous semantics and semantics-based tools.

The project is focussed on lightweight rigorous methods: precise specification (post hoc and during design) and testing against specifications, with full verification only in some cases. It thus provides a relatively smooth adoption path from conventional engineering, and the project emphasises building useful (and reusable) semantics and tools. We are building accurate full-scale mathematical models of some of the key computational abstractions (processor architectures, programming languages, concurrent OS interfaces, and network protocols), studying how this can be done, and investigating how such models can be used for new verification research and in new systems and programming language research. Supporting all this, we are also working on new specification tools and their foundations.

REMS is organised into five interlinked tasks.Three tasks are focussed on the three key abstractions: Task 2 on multicore processor architectures, Task 3 on programming languages, and Task 4a on concurrent systems software interfaces. In each of these, some subtasks build models and specifications of key parts of today's mainstream interfaces or of major research proposals, while other subtasks go on to apply, extend, and relate those models. For all three, good semantic tools, built in Task 1, are needed to work with the large mathematical definitions of our models. Task 4b addresses higher-level abstractions and the theory of how we can reason about these concurrent systems (Task 4 of the original proposal has been split into Task 4a and Task 4b for clarity).

Much of this needs collaboration not just between systems and semantics researchers within the project, but also with academic partners elsewhere and with industry vendors and groups.We have substantial interactions with ARM, IBM, the ISO C and C++ standards committees (WG14 and WG21), the RISC-V Foundation Memory Model and ISA Formal Task Groups, the IETF, the ECMAScript committee, and the WebAssembly Committee. We have also engaged with Amazon, POSIX, the FreeBSD development community, the Linux concurrency development community, Qualcomm, Apple, and Nvidia. Also very important are close working links with other major academic projects, especially:

CTSRD: Rethinking the hardware-software interface for security, including CHERI (Moore, Neumann, Watson et al.; Cambridge and SRI International), https://www.cl.cam.ac.uk/research/security/ctsrd/
CakeML: A Verified Implementation of ML (Fox, Kumar, Myreen, Norrish, Owens, Tan; Chalmers, Kent, Cambridge, Data61), https://cakeml.org/
OCaml Labs and Mirage OS (Madhavapeddy et al.; Cambridge), http://www.cl.cam.ac.uk/projects/ocamllabs/, https://mirage.io/
SibylFS: A Filesystem Test Oracle (Ridge et al.; Leicester), http://sibylfs.github.io/

Academic collaborators include colleagues at St Andrews, Kent, Leicester, INRIA Paris, Chalmers, NICTA Australia, and the DeepSpec NSF Expedition in Computing in the USA.

In this document we summarise progress in the project as a whole, from its start in 2013-03 to date (2019-01). The current state of the project, including links to all publications to date and to our main models and software, is presented on the REMS web page, http//www.cl.cam.ac.uk/users/pes20/rems.


1 Task 1: Semantic tools

The mathematical models of mainstream-system abstractions that we need have to be expressed in well-defined formal languages, for the sake of mathematical precision, and those languages need good software tool support, to let us handle large and complex definitions. In this task we address the foundational question of what formal languages are suitable for this purpose; the pragmatic question of what facilities good tool support should provide; and the engineering question of actually building these tools to a sufficient quality to be usable by other researchers and by industry staff. In REMS we are developing and using the following tools.

Sail is a domain-specific language for concurrent and sequential processor instruction-set architecture (ISA) definition, to define the architected behaviour of processor instructions [1, 7, 8, 10, 12]. Sail is a first-order functional/imperative language, with a lightweight dependent types for vector lengths and indices and for integer ranges. We have Sail models for ARMv8-A, RISC-V, MIPS, and CHERI-MIPS, each complete enough to boot an OS, and for smaller fragments of IBM POWER and x86. Given a Sail ISA definition, the tool will automatically generate emulators in C and OCaml; theorem-prover definitions in Isabelle, HOL4, and Coq (via Lem for the first two); and versions of the semantics integrated with our RMEM concurrency model tool (both a deep embedding, in a form that can be executed by a Sail interpreter written in Lem, and shallow embeddings to Lem and OCaml). Sail is available under a permissive open-source licence.

Sail web page: http://www.cl.cam.ac.uk/users/pes20/sail/.
Sail github repo: https://github.com/rems-project/sail.
Sail is also available as an OPAM package.
Lem is a tool for large-scale portable semantic specification [2, 6]. Lem supports the definition language of a typed higher-order logic, for type, function, and inductive-relation definitions, in an OCaml-like syntax. Given such a definition, the tool will automatically generate (for various fragments) OCaml, LaTeX, HTML, Isabelle, HOL4, and Coq versions. Lem is used in our work on Sail, concurrent architecture specification, CakeML, ELF linking, Concurrent C semantics, Sequential C semantics, and SibylFS POSIX filesystem semantics. Lem is available under a permissive open-source licence.

Lem web page: http://www.cl.cam.ac.uk/users/pes20/lem/.
Lem github repo: https://github.com/rems-project/lem/.
Ott is a tool for programming language definitions, supporting user-defined syntax and inductive relations over it, and compiling to LaTeX and to theorem-prover definitions for Coq, HOL4, and Isabelle; it can also generates some OCaml infrastructure, e.g. concrete substitution functions and simple parser and pretty-printer code. Ott is available under a permissive open-source licence.

In 2017, the original Ott paper

Ott: Effective Tool Support for the Working Semanticist. Peter Sewell, Francesco Zappa Nardelli, Scott Owens, Gilles Peskine, Thomas Ridge, Susmit Sarkar, and Rok Strniša. ICFP 2007

was awarded the ACM SIGPLAN Most Influential ICFP Paper Award (the papers are judged by their influence over the past decade).

Ott web page: http://www.cl.cam.ac.uk/users/pes20/ott/.
Ott github repo: https://github.com/ott-lang/ott.
Ott is also available as an OPAM package.
L3 is an earlier domain-specific language for sequential processor instruction-set architecture (ISA) semantic definition [4, 64]. L3 exports to HOL4 and Isabelle/HOL, including Hoare-triple generation, and to SML, providing reasonably fast emulation. L3 is used in our work on CakeML, CHERI, and instruction testing. L3 is available under a permissive open-source licence.

L3 github repo: https://acjf3.github.io/l3.

In Task 1, we have also worked on:

A semantic meta-language enabling multiple interpretations, for concrete, abstract, and flow-sensistive analysis intepretations [3].
The foundations needed for future semantic tools, focussed on Nominal Type Theory [5].


2 Task 2: Architectural Multiprocessor Semantics

The first key interface we address is the architectural interface between hardware and software, specifying the multiprocessor behaviours that processor implementations guarantee and that systems code relies on. Normal industrial practice is to specify this interface in informal-prose architecture reference manuals, sometimes with pseudocode descriptions of the sequential instruction semantics, but these descriptions cannot be used directly to test either the hardware below the interface or software above it, or to build verification tools, let alone as a basis for proof. Compounding the problem, architectures must be nondeterministic loose specifications, to accommodate microarchitectural variation, so one cannot simply characterise them with reference implementations.

There are two linked challenges: handling the subtlety of the concurrency architectures, and handling the scale of the instruction-set descriptions. We are addressing these for multiple architectures: the ARM A-class production architecture, which is the basis for the processors used in the majority of mobile phones and many other devices; the IBM POWER production architecture, used for high-performance server machines, among other areas; the open RISC-V architecture; the x86 architecture of Intel and AMD processors, used in the majority of laptop, desktop, and server machines; and the research CHERI architecture, of which more below.
2.1 Multiprocessor Concurrency: Rigorous Concurrency Architecture

The concurrency architecture of a family of multiprocessors defines the range of allowed behaviours of concurrent programs. It is the interface between multiprocessor hardware vendors, who endeavour to ensure that all their hardware implementations have behaviour within that range, and concurrent programmers (of high-performance concurrent algorithms, operating system kernels, etc.), who can thereby assume that range as their programmer's model. Ten years ago, in 2008, most architectures, including ARM, x86, and IBM POWER, did not have well-defined concurrency architectures; they had only non-rigorous and highly confusing prose descriptions of the allowed behaviour, sometimes augmented by a few examples. These did not provide a usable basis for either hardware testing or concurrent programming.

Our work before REMS, in collaboration with other academic colleagues and with senior IBM staff, established rigorous concurrency models for modest fragments of the x86 and IBM POWER architectures. These covered the fundamental "user-mode" non-privileged concurrency instructions (loads, stores, dependencies, memory barriers, read-modify-write and load-linked/store-conditional instructions), with ad hoc descriptions of the intra-instruction behaviour and assuming no mixed-size accesses.

Within REMS, we have successfully moved to a world in which these aspects of production architectures are mathematically defined, based on rigorous engineering processes including formal modelling, proof about those models, making the models executable as test oracles, and testing against them.
ARMv8-A

We worked closely with ARM staff and our INRIA partners to clarify the concurrency model of their new ARMv8-A architecture, with a combination of rigorous formal modelling, experimental testing, test generation, and mathematical proof. This contributed to ARM's 2017 switch to a "multi-copy-atomic" model - a substantial simplification. Together we have developed rigorous axiomatic and operational models, principally by ARM and by our academic group respectively, but deeply informed by each other and proved equivalent for the intersection of their scope. The ARM architecture definition now contains the former, replacing its previous unusable prose [7, 8, 10].

Along the way, our experimental testing discovered a number of hardware errata, in core designs from four major vendors, and our test suite is being used within ARM.
RISC-V

Most major architectures (x86, ARM, MIPS, IBM POWER, Itanium) are proprietary. In contrast, the RISC-V architecture is an open-source architecture, gathering increasing academic and industry interest. Its concurrency architecture is broadly similar to the simplified ARMv8 model, but with a number of subtle differences. We have contributed substantially to the RISC-V Memory Model Task Group and the resulting specification [14], which again has (experimentally) equivalent axiomatic and operational models.

We have also provided a permissive open-source library of approximately 7 000 concurrency litmus tests for RISC-V, including the reference results from the two models.


RISC-V Litmus Tests github repo: https://github.com/litmus-tests/litmus-tests-riscv
Mixed-size and instruction-semantics integration

Scaling up from the modest fragments of the instruction set of our pre-REMS work, we have extended all of these models - and also our operational IBM POWER (working closely with senior IBM staff) and x86 models - to cover mixed-size accesses (all the subtleties of overlapping accesses to parts of memory and registers), and integrated them with principled descriptions of the intra-instruction semantics. This finally suffices to define the behaviour of concurrent algorithm code, not just the litmus tests used in concurrency model development.
Our RMEM exploration tool

Combining all the above with extensive search and user-interface engineering, we provide a web-interface tool, RMEM, that allows one to explore, either interactive, randomly, or exhaustively, the behaviour of concurrent code for ARMv8, RISC-V, IBM POWER, and x86. This supports both litmus tests and small ELF binaries (e.g. compiled by gcc or clang), using our semantics for ELF linking and DWARF debug information from Task 3. The model-allowed behaviour results can then be compared with experimental data, obtained using the litmus tool of the diy tool suite (http://diy.inria.fr/doc/), from production silicon or simulation. The sources for our operational models and RMEM tool are available under a permissive open-source licence.


RMEM tool web interface: http://www.cl.cam.ac.uk/users/pes20/rmem.
RMEM github repo: https://github.com/rems-project/rmem.
GPU Concurrency

We also studied the concurrency semantics of GPUs, with colleagues elsewhere - conducting a large empirical study of their behaviour and proposing a model of Nvidia GPU hardware [11].
2.2 Instruction-Set Architecture (ISA)

The second important aspect of architecture specification is the sequential intra-instruction behaviour. At least for the non-privileged part of an architecture, and neglecting floating point, this is usually less semantically involved than the concurrency behaviour, but the scale of modern ISAs makes it a non-trivial problem, particularly when one wants definitions that are both readable and usable for mathematical proof. We have developed two domain-specific languages for describing ISA behaviour: first L3 and then Sail, as introduced in Task 1.
ARMv8-A

We are working closely with ARM to translate their internal ("ASL") description of their ARMv8-A architecture into Sail [1]. The resulting Sail model is complete enough to boot Linux, and to pass most of the relevant ARM-internal Architecture Validation Suite tests (in the C emulator generated from Sail). We have approval from ARM to release this under a permissive open-source licence. This will be the first modern production architecture to have a public formal definition complete enough for an OS boot; it should provide a foundation for much future work on system software verification. As a first step towards this, we have an Isabelle machine proof of a characterisation of ARMv8-A address translation.

The first version of this was for a fragment based on the ARM public XML release for ARMv8.3-A, excluding system registers. A more complete version, updated to ARMv8.5-A, is now available under a permissive open-source licence (BSD 3 Clause Clear).


ARMv8.5-A ASL-derived Sail ISA model, in a github repo: https://github.com/rems-project/sail-arm.
ARMv8.3-A ASL-derived Sail ISA model, in the Sail github repo: https://github.com/rems-project/sail/tree/sail2/aarch64.


We also have an earlier hand-written Sail ISA model for a modest user-mode ARM fragment; this model is integrated with our RMEM tool to compute the set of all allowed behaviour for ARMv8 concurrency litmus tests or small ELF files.


ARMv8 hand-written Sail ISA model, in the Sail github repo: https://github.com/rems-project/sail/tree/sail2/arm.
ARMv6, ARMv7, and ARM Cortex-M0

We also developed sequential L3 models for the ARMv6-A and ARMv7-A architectures and the ARM Cortex-M0 implementations, the former including floating-point and some system-level instructions, and the latter equipped with accurate cycle counts for work on resource-aware compilation.
RISC-V

We are working closely with our SRI International partners to hand-write a formal RISC-V ISA specification. This is also complete enough to boot an OS, here Linux, FreeBSD, and seL4, and the model is integrated with our RMEM tool to compute the set of all allowed behaviour for the RISC-V concurrency litmus tests mentioned above.


RISC-V Sail ISA model github repo: https://github.com/rems-project/sail-riscv
IBM POWER

We worked closely with IBM to automatically translate their ISA pseudocode, for a modest user-mode non-vector non-floating-point fragment, from an XML version of their ISA documentation (not previously parsed or type-checked) into a Sail formal model (albeit currently only for an older version of Sail) [12]. This too is integrated with the RMEM tool. We tested the ISA model by generating single-instruction test cases and comparing their behaviour in the model and on POWER 7 hardware; for the 154 user-mode branch and fixed-point instructions, we generated 6984 tests.


IBM POWER Sail ISA model, in the Sail github repo: https://github.com/rems-project/sail/tree/sail2/power.
x86

We have hand-written a Sail model for a modest user-mode fragment of x86, again integrated with RMEM to support concurrency exploration.


x86 Sail ISA model, in the Sail github repo: https://github.com/rems-project/sail/tree/sail2/x86.
2.3 CHERI Architecture Formal Engineering

Turning to large-scale systems research, the CHERI project, led by Moore and Watson at Cambridge and Neumann at SRI International, largely DARPA-funded from 2010 to date, is developing new architectural mechanisms for security protection, revisiting the hardware-software security interface for general-purpose CPUs to fundamentally improve security. CHERI includes an architecture design, processor implementations, and a software stack adapting Clang/LLVM and FreeBSD.

In contrast to ARM and IBM POWER, this is a research architecture, not a mainstream production architecture, but it is being developed at scale, e.g. with a software stack including a variant of a production OS.

CHERI was originally developed using traditional systems and computer architecture methods, with prose and pseudocode descriptions of the architecture, hand-written test suites, and hardware and software prototypes - though with an unusual hardware/software co-design process. REMS created the opportunity for new collaborations between the original CHERI team, spanning systems, security, and computer architecture, and REMS researchers in semantics, verification, and programming languages. This gave us an ideal context to develop and try new rigorous engineering methods in practice:

Since 2014, CHERI architecture research has used formal models of the architecture as central design tools - first in our L3 language, and now in our Sail language. The Sail version is used as the principal definition in the current working version of the CHERI ISA specification (to be released as version 7 of the CHERI architecture).
An SML emulator (running at several hundred KIPS) automatically generated from the L3 formal model has been invaluable, e.g. to bring up a bootable CHERI version of FreeBSD above the model, and as an oracle for testing the CHERI hardware implementation.
We have augmented the hand-written CHERI test suite with an automatic test generator: it creates randomized instruction sequences paired with precise descriptions of their required behaviour, machine-checked against our model by HOL4/SMT provers [26, 25, 24]. We have also produced a generic synthesisable test bench, BlueCheck [19]. This has been used to check various microarchitectural properties of CHERI, finding some bugs. It has moreover been used for teaching in MIT (in the 6.175 Computer Architecture course).
As a first exercise in developing mathematically rigorous proof of specific security properties about an architecture description, we have generated theorem-prover definitions (in the Isabelle and HOL4 provers), discussed and clarified some of the fundamental security properties the CHERI designers intended the architecture to guarantee, and produced machine proofs (in Isabelle) of some of these.
We have proved (mechanised proof, in HOL4) correctness properties of the "CHERI-128" compressed capability scheme.
We have further contributed to the CHERI hardware implementation by building an efficient checker for its memory consistency model, our Axe tool [9], which has also been used for RISC-V.

The theory/practice interactions with the CHERI team have also been fruitful at higher levels in the stack, for the C language, linking, and OS interfaces, as we return to below. This has only been possible with intertwined groups that cover both systems and semantics at multiple layers of the stack, from hardware to systems software.
2.4 CakeML ISA models

We have used L3 to define the user-mode instruction semantics for each of the target architectures of the CakeML verified compiler, and to generate proof infrastructure for them [64, 65, 66, 67]. CakeML uses the above ARMv6-A and ARMv8-A L3 models, together with MIPS and x86-64 models developed here and a RISC-V model developed by our partners at SRI International.


3 Task 3: Systems Programming Languages

Here our main focus is on abstractions used for real-world systems programming, looking at the C language (both sequential and concurrent aspects), runtime typechecking for systems code (in the libcrunch system), linking (specifically at the ELF format), CHERI C JavaScript, WebAssembly, and OCaml. We have also contributed to the CakeML verified ML compiler with the ISA modelling work described in Task 1.
3.1 Sequential C: Cerberus

The C language, originating around 1970, remains at the heart of a great deal of security-critical systems software, including operating systems, hypervisors, and embedded and Internet-of-Things devices - and many security flaws ultimately come down to C code. Ambiguities and differences of opinion about what the language is thus have considerable potential for harm - but it has never been well-defined. In earlier (pre-REMS) work, we collaborated closely with the ISO WG21 C++ and WG14 C committees to clarify their concurrency models, with our mathematical models (in prose form) incorporated into those international standards.

Our work on the sequential aspects of C semantics proceeds on several fronts: investigating the de facto standards of C as it is implemented and used; addressing and contributing to the ISO standard produced by the ISO WG14 C committee; and building an executable formal model for a substantial fragment of C, Cerberus [27, 28]. Our initial paper on this [27] was awarded a PLDI 2016 Distinguished Paper Award. We intend to release Cerberus under a permissive open-source licence in due course.


Cerberus web page https://www.cl.cam.ac.uk/users/pes20/cerberus/
Cerberus web interface: http://cerberus.cl.cam.ac.uk/cerberus


We are participating in the ISO WG14 C standards committee: we attended the April 2016 (London), October 2016 (Pittsburgh), April 2018 (Brno), and October 2018 (Pittsburgh) committee meetings, producing committee discussion papers ([29, 30, 31, 32], [33, 34, 35], [36, 37, 38, 39, 40], [41]), and instigated the WG14 C Memory Object Model Study Group; we are also engaging with the various compiler and developer communities, with talks at EuroLLVM 2018 and the GNU Tools Cauldron 2018, to Red Hat, and at 35c3.
3.2 CHERI C

At the C language level, just as at the architecture level, we have another strong collaboration with the CHERI work, with their experience in systems software (and specifically in porting it to the C supported by CHERI) informing our semantic understanding, and with this discussion identifying issues with the CHERI design. This made substantial contributions to the two Cerberus papers cited above [27, 28], and we have also contributed to work on the CHERI ABI, enforcing pointer provenance and minimising pointer privilege in a POSIX C environment [57].

We also developed a conceptual framework for application compartmentalization, a vulnerability mitigation technique employed in programs such as OpenSSH and the Chromium web browser, which decomposes software into isolated components to limit privileges leaked or otherwise available to attackers. This is embodied in an LLVM-based tool: the Security-Oriented Analysis of Application Programs (SOAAP), that allows programmers to reason about compartmentalization using source-code annotations (compartmentalization hypotheses) [56].
3.3 C runtime type checking: libcrunch

We are developing a system for run-time type checking in C, libcrunch, extending it to additionally check array/pointer bounds [51, 52, 53]. Using run-time type information, we have found it possible to tolerate real C idioms not supportable by software fat-pointer schemes. Early indications suggest that the available run-time performance is comparable or better than competing approaches.

We have continued to refine the run-time type infrastructure underlying libcrunch into a more general-purpose infrastructure with a variety of other applications [55] including debugging, interfacing native code with dynamic languages, and dealing with memory-mapped files.


Libcrunch github repo: https://github.com/stephenrkell/libcrunch/
3.4 ELF linking: linksem

Systems software in reality is written not just in more-or-less conventional programming languages, like C and C++, but also using a complex menagerie of techniques to control the linking process. This linker speak is often poorly understood by working programmers and has largely been neglected by language researchers. We have surveyed the many use-cases that linkers support and developed the first validated formalisation of a realistic executable and linkable format, ELF, together with an executable specification of linking with substantial coverage [54]. This also incorporates a semantics for DWARF debug information. The model is written in Lem, which generates an executable OCaml version. This ELF model is used in our Task 2 RMEM architectural emulation tool. The model is available under a permissive open-source licence.


ELF semantics github repo: https://github.com/rems-project/linksem.
3.5 Concurrent C/C++

Turning to the concurrency semantics of C-based languages, in earlier (pre-REMS) work, we collaborated closely with the ISO WG21 C++ and WG14 C committees to clarify their concurrency models, with our mathematical models (in prose form) incorporated into those ISO C/C++11 international standards. (Batty et al., POPL 2011). We have built on those, and also considered potential future models that would fix the so-called 'thin-air problem' that C/C++11 suffers from.

Batty's 2014 Cambridge PhD thesis The C11 and C++11 Concurrency Model was awarded the 2015 ACM SIGPLAN John C. Reynolds Doctoral Dissertation Award, presented annually to the author of the "outstanding doctoral dissertation in the area of Programming Languages", and was the winner of the 2015 CPHC/BCS Distinguished Dissertations competition, selected as "the best British PhD/DPhil dissertations in computer science". The WG14 C committee also approved his remaining defect reports w.r.t. the C11 concurrency model.

We contributed to work in LLVM implementing the C/C++11 concurrency model [49, 50].

We developed a C11 operational semantics that is proved equivalent (in Isabelle/HOL) to the Batty et al. axiomatic model [43]. This was also integrated into our Cerberus sequential C model to allow litmus tests to be executed, though that integration is not currently maintained, to simplify Cerberus development.

We showed that no straightforward adaptation of the C/C++11 axiomatic model, phrased as a consistency predicate over candidate executions, can exclude thin-air reads while also admitting some standard optimisations, identified additional difficulties with undefined behaviour and with mixtures of atomic and nonatomic accesses. We also proved (in HOL4) that the C/C++11 model actually provides the DRF-SC guarantee [45].

We proposed a model that admits compiler optimisation while avoiding the thin-air problem [44, 47].

Along with the extension of our ARM and IBM POWER hardware models to support mixed-size accesses, described above, we proposed a corresponding extension to the C/C++11 axiomatic model to support mixed-size non-atomic accesses [8].

We discussed a new problem with C/C++11, uncovered by groups in Princeton and MPI-SWS/SNU, that contradicts the soundness of compilation schemes from C/C++11 to ARM and POWER (pre-REMS, we had published hand proofs of those that now turn out to be flawed); this required additional changes to C/C++11, developed by the latter group.

We proposed a safe, portable, and efficient implementation of co-routines [48].
3.6 Verified ML implementation: CakeML

We continue to contribute to the CakeML development of a verified ML implementation, especially w.r.t. the backend ISA models and proof, as discussed in Task 2.
3.7 JavaScript

We have developed JSExplain [60], a reference interpreter for JavaScript that closely follows the specification and that produces execution traces. These traces may be interactively investigated in a browser, with an interface that displays not only the code and the state of the interpreter, but also the code and the state of the interpreted program. In that respect, JSExplain is a double-debugger for the specification of JavaScript, useful both for standard developers and JavaScript programmers.

We have developed a system for logic-based verification of JavaScript programs [61] and a symbolic evaluation system for JavaScript [58].

We have developed JaVerT [59, 63], a semi-automatic JavaScript Verification Toolchain, based on separation logic and aimed at the specialist developer wanting rich, mechanically verified specifications of critical JavaScript code. JaVerT is the first verification tool based on separation logic that targets dynamic languages. To specify JavaScript code, we provide abstractions that capture its key heap structures (for example, prototype chains and function closures), allowing the developer to write clear and succinct specifications with minimal knowledge of the JavaScript internals. To verify JavaScript programs, we develop a trusted verification pipeline consisting of: JS-2-JSIL, a well-tested compiler from JavaScript to JSIL, an intermediate goto language capturing the fundamental dynamic features of JavaScript; JSIL Verify, a semi-automatic verification tool based on a sound JSIL separation logic; and verified axiomatic specifications of the JavaScript internal functions. JaVerT has so far been used to verify functional correctness properties of: data-structure libraries (key-value map, priority queue) written in an object-oriented style; operations on data structures such as binary search trees (BSTs) and lists; examples illustrating function closures; and test cases from the official ECMAScript test suite. The verification times suggest that reasoning about larger, more complex code using JaVerT is feasible.

This work has been presented in keynotes in CADE 2017 and FSEN 2017, and at an invited talk at Emerging Technologies, the 2017 New York ECMAScript meeting.


JavaScript semantics web page: http://psvg.doc.ic.ac.uk/research/javascript.html
JavaScript semantics (jscert) repo: https://github.com/jscert/jscert
3.8 WebAssembly

WebAssembly is a new low-level language currently being implemented in all major web browsers. It is designed to become the universal compilation target for the web, obsoleting previous solutions such as asm.js and Native Client. We have developed a mechanised Isabelle specification for WebAssembly, including a verified executable interpreter and type-checker, and a mechanised proof of type soundness [68]. This exposed several issues with the official WebAssembly specification and influenced its development. We have also worked on constant-time computation in WebAssembly [69].


WebAssembly AFP Isabelle development: https://www.isa-afp.org/entries/WebAssembly.html
3.9 OCaml concurrency and infrastructure

Working with our OCaml Labs colleagues, we have proposed a concurrency semantics for OCaml [70] and contributed to various OCaml infrastructure: a generic approach to defining partially-static data and corresponding operations [72]; conex [73], a tool which allows developers to cryptographically sign their released software, and users to verify them, ensuring integrity and authenticity without a centralised trust authority; and a modular foreign function interface for OCaml [71].


4 Task 4a: System APIs and Protocols
4.1 POSIX Filesystem Specification and Reasoning

We have pursued two directions here: operational modelling, validated by extensive testing against production filesystem implementations, and program-logic work oriented towards reasoning about filesystems.
Operational Modelling

SibylFS is a semantics and filesystem test oracle for POSIX filesystems, developed in a collaboration between semantics and system researchers led by Ridge at Leicester [81]. SibylFS has been used to validate and identify bugs in existing production filesystems and in two verified filesystems (FSCQ and Flashix) from academia. The SibylFS infrastructure was used by Docker, following its acquisition of Unikernel Systems (a spin-out from Cambridge). The SibylFS model has been integrated into our Task 3 Cerberus C semantics, providing an integrated semantic model for C programs that use a filesystem. SibylFS is available under a permissive open-source licence.


SibylFS web page: http://sibylfs.github.io/
SibylFS github repo: https://github.com/sibylfs/
Program-logic reasoning

In parallel, we have developed a separation-logic style program logic for a sequential specification of a core subset of POSIX file systems and for modular reasoning about client programs using file systems.

We developed a separation logic style program logic for path-addressable structured data, and applied this logic to develop a sequential specification of a core subset of POSIX file systems. This work demonstrated the modularity and scalability of reasoning about file-system client programs using simplified paths that allow updates to appear local [86]. We then developed fusion logic, a file-system specific program logic based on separation logic, for reasoning about non-local properties of arbitrary paths using symbolic links or dot-dots. We extended the previous sequential specification of core POSIX file systems to arbitrary paths. The extended specification and fusion logic were then used to reason about implementations of the rm utility, discovering bugs in popular implementations due to mishandling of symbolic links and dot-dots in paths [84].

We have extended concurrent program logics with reasoning about how programs are affected by, and recover from, host failures (crashes), as in file systems and databases [85]. This extends the Views framework, which acts as a basis for many concurrent program logics, with host failure and recovery semantics; we instantiated this with a Fault-tolerant Concurrent Separation Logic and used it to verify properties of an ARIES recovery algorithm used in databases.
4.2 TLS Security

In another collaboration between our systems and semantics groups, we have developed a clean-slate specification and implementation, nqsb-TLS, of the Transport Layer Security (TLS) protocol. This is fundamental to internet use, underlying https secure web access and ssh interaction, but its implementations have a history of security flaws [77, 76]. This is available under a permissive open-source licence. We engaged in the TLS 1.3 design process [75], providing extensive testing facilities for other implementations; and we presented an outreach talk at 31C3 [78].

We developed libnqsb-tls1 [74], a drop-in replacement for the C libtls library that wraps our pure OCaml TLS implementation ocaml-tls. We have also demonstrated unmodified OpenBSD system software running against our replacement library.


nqsbTLS web page: https://nqsb.io/
nqsbTLS github repo: https://github.com/mirleft/ocaml-tls
libnqsb-tls1 github repo: https://github.com/mirleft/libnqsb-tls/.
4.3 TCP/IP

We have resurrected the NetSem network semantics from 2000-2009, which defined an executable-as-test-oracle specification for the TCP and UDP network protocols and their Sockets API, using HOL4. In collaboration with the FreeBSD project, and with the DARPA CADETS project at BAE/Cambridge/Memorial, we now collect traces from a current FreeBSD system using DTrace1; CADETS is extending DTrace to support this work. Exploiting DTrace lets us substantially simplify the instrumentation and to instrument in more places, allowing nondeterminism to be resolved early; together with Moore's-law advances in checking speed, this let us produce a more usable test oracle. We have published a J.ACM journal paper on the entire NetSem project, including this [80].

The model and infrastructure are available under a permissive open-source licence.


NetSem web page: https://www.cl.cam.ac.uk/users/pes20/Netsem/
NetSem github repo: https://github.com/PeterSewell/netsem.


5 Task 4b: Concurrency Reasoning

Here we focus on challenging problems of concurrent systems that include correctness properties, such as atomicity and termination, abstract specifications for concurrent modules and modular reasoning, and memory models. We have approached these problems from varying yet complementary directions, by developing new program logics models and type systems, algebraic reasoning, linearisability correctness conditions and proof techniques, reduction theories and automated verification tools.
5.1 Program logics, models, type systems

We have proposed a logic, iCAP-TSO, for reasoning about programs running on a TSO memory model. The logic supports direct reasoning about the buffering observable on TSO machines. It also supports a fiction of sequential consistency that hides this buffering behaviour from clients for libraries that contain sufficient synchronisation. This allows us to recover standard separation logic reasoning for well-synchronised clients and still support linking with code that requires explicit reasoning about buffering. [105]

We have proposed a refinement of the mathematical technique of step-indexing [102]. Step-indexing is used to break circularities that arises when constructing models of advanced program logics and type systems. Our refinement provides a more flexible way of breaking these circularities that increases compositionality when reasoning about program equivalence. We are currently investigating whether the same is true for program logics.

We have defined a relational semantics for an expressive type-and-effect system to validate effect-based program transformations. The type-and-effect system combines effect tracking with a lightweight mechanism for tracking state ownership. This combination is intended to simplify concurrent programming by allowing a compiler to automatically parallelise sub-computations based on inferred effect types. We use the semantics to prove that such automatic parallelisation is sound. [93]

We have studied abstract local reasoning for concurrent libraries [108]. There are two main approaches: provide a specification of a library by abstracting from concrete reasoning about an implementation; or provide a direct abstract library specification, justified by refining to an implementation. Both approaches have a significant gap in their reasoning, due to a mismatch between the abstract connectivity of the abstract data structures and the concrete connectivity of the concrete heap representations. We demonstrate this gap using structural separation logic (SSL) for specifying a concurrent tree library and concurrent abstract predicates (CAP) for reasoning about a concrete tree implementation. The gap between the abstract and concrete connectivity emerges as a mismatch between the SSL tree predicates and CAP heap predicates. This gap is closed by an interface function I which links the abstract and concrete connectivity. We generalise our SSL reasoning and results to arbitrary concurrent data libraries.

We show how to verify four challenging concurrent fine-grained graph-manipulating algorithms, including graph copy, a speculatively parallel Dijkstra, graph marking and spanning tree. We develop a reasoning method for such algorithms that dynamically tracks the contributions and responsibilities of each thread operating on a graph, even in cases of arbitrary recursive thread creation [99]. We demonstrate how to use a logic without abstraction (CoLoSL) to carry out abstract reasoning in the style of iCAP, by building the abstraction into the proof structure rather than incorporating it into the semantic model of the logic.

We have developed TaDA, a separation logic for fine-grained concurrency which introduces atomic triples for specifying the observable atomicity exhibited by operations of concurr ent modules. The atomic triples can be used to specify the atomicity of operations at the granularity of a particular data-abstraction, such as an abstract queue, and under restricted interf erence from the concurrent environment [109]. This approach leads to strong specifications of concurrent modules, such as locks, double-compare-and-swap, and concurrent queues, that do not depend on particular usage patterns.

We have demonstrated the scalability of TaDA reasoning by studying concurrent maps. In particular, we defined two abstract specifications for a concurrent map module, one focusing on the entire abstract map and the other on individual key-value pairs, which we proved to be equivalent by applying TaDA. We have applied the TaDA program logic to produce the first partial correctness proof of core operations of ConcurrentSkipListMap in java.util.concurrent [96].

We have extended TaDA with well-founded termination reasoning. This led to the development of Total-TaDA, the first program logic for reasoning about total correctness of fine-grained concurrent programs [101].

We have surveyed a range of verification techniques to specify concurrent programs, from Owicki-Gries to modern logics. We show how auxiliary state, interference abstraction, resource ownership, and atomicity are combined to provide powerful approaches to specify concurrent modules in CAP and TaDA [103].

A key difficulty in verifying shared-memory concurrent programs is reasoning compositionally about each thread in isolation. Existing verification techniques for fine-grained concurrency typically require reasoning about either the entire shared state or disjoint parts of the shared state, impeding compositionality. We have introduced the program logic CoLoSL, where each thread is verified with respect to its subjective view of the global shared state [106]. This subjective view describes only that part of the state accessed by the thread. Subjective views may arbitrarily overlap with each other, and expand and contract depending on the resource required by the thread. This flexibility gives rise to small specifications and, hence, more compositional reasoning for concurrent programs. We demonstrate our reasoning on a range of examples, including a concurrent computation of a spanning tree of a graph.
5.2 Algebraic reasoning

We formalise a modular hierarchy of algebras with domain and antidomain (domain complement) operations in Isabelle/HOL that ranges from modal semigroups to modal Kleene algebras and divergence Kleene algebras. We link these algebras with models of binary relations and program traces. We include some examples from modal logics, termination and program analysis [111].

We describe the application of Modal Kleene algebra in program correctness. Starting from a weakest precondition based component, we show how variants for Hoare logic, strongest postconditions and program refinement can be built in a principled way. Modularity of the approach is demonstrated by variants that capture program termination and recursion, memory models for programs with pointers, and trace semantics relevant to concurrency verification [112, 113].

We have summarised the progress in the research towards the construction of links between algebraic presentations of the principles of programming and the exploitation of concurrency in modern programming practice [110]. The research concentrates on the construction of a realistic family of partial order models for Concurrent Kleene Algebra (aka, the Laws of Programming). The main elements of the model are objects and the events in which they engage. Further primitive concepts are traces, errors and failures, and transferrable ownership. In terms of these we can define other concepts which have proved useful in reasoning about concurrent programs, for example causal dependency and independence, sequentiality and concurrency, allocation and disposal, synchrony and asynchrony, sharing and locality, input and output.
5.3 Linearizability and reduction theory

We generalized Lipton's reduction theory for TSO programs. We developed a methodology with which one can check whether a given program allows non-SC behaviours and transform any given program P to another program P' such that P under TSO is observationally equivalent to P' under SC [104].

Linearisability of concurrent data structures is usually proved by monolithic simulation arguments relying on the identification of the so-called linearization points. Regrettably, such proofs, whether manual or automatic, are often complicated and scale poorly to advanced non-blocking concurrency patterns, such as helping and optimistic updates. In response, we propose a more modular way of checking linearisability of concurrent queue algorithms that does not involve identifying linearization points [107]. We reduce the task of proving linearisability with respect to the queue specification to establishing four basic properties, each of which can be proved independently by simpler arguments. As a demonstration of our approach, we verify the Herlihy and Wing queue, an algorithm that is challenging to verify by a simulation proof.

We developed local linearizability, a relaxed consistency condition that is applicable to container-type concurrent data structures like pools, queues, and stacks. While linearizability requires that the effect of each operation is observed by all threads at the same time, local linearizability only requires that for each thread T, the effects of its local insertion operations and the effects of those removal operations that remove values inserted by T are observed by all threads at the same time. We investigate theoretical and practical properties of local linearizability and its relationship to many existing consistency conditions. We present a generic implementation method for locally linearizable data structures that uses existing linearizable data structures as building blocks. Our implementations show performance and scalability improvements over the original building blocks and outperform the fastest existing container-type implementations [100]
5.4 Automated verification

We have developed Caper, a tool for automating poofs in concurrent separation logic with shared regions, aimed at proving functional correctness for fine-grained concurrent algo rithms. The tool supports a logic in the spirit of concurrent abstract predicates taking inspiration from recent developments such as TaDA [109]. Caper provides a foundation for exploring the possibilities for automation with such a logic [95].

We have created an axiomatic specification of a key fragment of the Document Object Model (DOM) API using structural separation logic. This specification allows us to develop modular reasoning about client programs that call the DOM [62].
5.5 Recent concurrency reasoning

More recently, we have published several papers on reasoning about different aspects of concurrency:

Separation logic for Promising Semantics [90] [ESOP 2018]
Algebraic laws for weak consistency [91] [CONCUR 2017]
Verifying strong eventual consistency in distributed systems [92] [OOPSLA 2017, Distinguished Paper Award]
Transactions with relaxed concurrency (largely non-REMS work) [89] [VMCAI 2018]
Modular concurrent verification [88] [J.LAMP 2018]
Concurrent specification of POSIX file systems [82] [ECOOP 2018]

Other

We have also published a few papers that do not fit exactly into the above task structure:
Verified trustworthy systems

We have published an opinion piece [87] with the goal of raising awareness about the pressing issues surrounding reliability of complex modern software systems and the importance of specification and verification in the industrial design process. This publication is part of a broader journal publication on Verified Trustworthy Software Systems, aimed at the general public.
Confidentiality verification

We have published a journal paper on earlier (largely non-REMS) work on formal verification of confidentiality in a social media system [114].
Programming and proving with classical types

We have published the following on programming and proving with classical types, building on an MPhil dissertation of the first author, supervised by REMS staff [115].
Exploitation Route Please see URL above
Sectors Aerospace, Defence and Marine,Digital/Communication/Information Technologies (including Software),Electronics,Security and Diplomacy

URL http://www.cl.cam.ac.uk/users/pes20/rems
 
Description Clarification of the behaviour of multiprocessors, programming languages, and security protocols Please see the project web page https://www.cl.cam.ac.uk/~pes20/rems/index.html and summary http://www.cl.cam.ac.uk/users/pes20/rems/report-2019-web.pdf
Sector Digital/Communication/Information Technologies (including Software),Electronics
Impact Types Economic

 
Title Research data supporting "Dynamically Diagnosing Type Errors in Unsafe Code" 
Description Buildable source code of the libcrunch system and related codebases, within a Debian 8.5 (Jessie) distribution, packaged as a VirtualBox virtual machine image. 
Type Of Material Database/Collection of data 
Year Produced 2017 
Provided To Others? Yes