REMS: Rigorous Engineering for Mainstream Systems

Lead Research Organisation: University of Cambridge
Department Name: Computer Laboratory

Abstract

Computer systems are critical to modern society, from embedded devices and mobile phones through to internet services, but traditional computer engineering cannot produce reliable and secure systems, as we see in the continual stream of errors and security flaws impacting industry and consumers everywhere. Industry relies on informal prose specifications and ad hoc testing, but the complexity and scale of today's systems make it effectively impossible to exclude errors using only these methods (even though perhaps 70\% of development staff in major vendors are devoted to testing) and adversarial exploits all too often turn errors into major security concerns, with huge financial and social costs. Compounding the problem, the industry-wide abstractions around which real systems are built, including the architectural interfaces between multiprocessor hardware and software, and programming languages such as C and C++, are ill-defined, giving rise to ambiguity and error.

This is a long-standing problem, motivating extensive research on mathematically rigorous alternatives. Until now, it has been impractical to address mainstream widely-deployed systems with rigorous techniques. Academic research has focussed instead on modest-scale safety critical systems and academic verifications of idealised systems. There has been a deep disconnect between practical industrial engineering and systems research on the one hand, and theoretical research in semantics, verification, and foundations on the other. Now, however, the new approaches that we have, with a pragmatic emphasis on the use of mathematically rigorous models where they can be most effective, finally make it feasible to address industrial-scale systems--- and the commercial and social cost of not doing so is becoming ever greater. It is time for rigorous engineering of mainstream systems, unifying practice and theory.

REMS brings together a team of leading theory and systems researchers, with key industrial partners, to establish a new synthesis. We will establish a true engineering mathematics for the construction of more robust and secure computer systems: mathematically rigorous models, verification techniques, and engineering tools applied to the construction of full-scale mainstream computer systems, including key infrastructure in widespread use today (multiprocessors, programming languages, and operating systems). We will couple this with novel programming-language and verification support for future system-building. Developing our recent work with processor vendors and language standards bodies, we will address the technical, engineering and social challenges necessary to bring these techniques into widespread use, enabling a radical improvement in computer engineering as a whole.

Planned Impact

Computer systems are pervasive, from a host of embedded devices (in everything from heating controllers and power distribution networks to cars and mobile phones), through the Internet giving us email and the web, up to the supercomputers used for weather forecasting and scientific research. We rely on them in almost everything we do, but they are fundamentally unreliable: they are developed by testing-based methods that cannot handle the complexity of modern systems. The result is a high direct cost, to build adequate systems by these inadequate methods, and a much higher indirect cost, on all of society, to handle the inevitable bugs and inexplicable behaviours that occur and to deal with the security problems that arise when the remaining errors are exploited.


The ambitious aim of the REMS project is to enable and provoke a profound change in how computer systems are engineered, placing *mathematically rigorous* engineering methods at the heart of the development of mainstream systems. This will be achieved by building on precise mathematical models of some of the key abstractions around which all computer systems are constructed: multiprocessors and the C programming language. It will not remove all problems, but it will provide a significant step towards making the products of computer engineering as reliable as the other engineering infrastructure on which society depends.

Towards this broad societal impact are many specific opportunities for economic impact, including:

- On the architectural interface between hardware and software, our processor-vendor partners ARM and IBM will benefit directly from improved specifications, the testing tools that they enable, and discussions of future architectural choices; all users of this interface (compiler and operating system vendors) will benefit from the increased clarity.

- On the definition of the C language, system developers will benefit directly from more accurate ISO standards and improved core infrastructure and, more concretely, from our static and dynamic analysis tools.

- Several start-up possibilities that may arise from particular subtasks, e.g. from some of our tool developments, which we will pursue as appropriate, building on our XenSource experience.



The most important economic impact will be via the eventual effects of deploying rigorous methods across the computing industry, improving quality and reducing time- and cost-to-market for its products. By improving the quality of these products, we will reduce the hidden cost of unreliability and insecurity in computer systems, which affect us all.
 
Description Please see the project web pages http://www.cl.cam.ac.uk/users/pes20/rems
and the summary document: http://www.cl.cam.ac.uk/users/pes20/rems/report-2019-web.pdf


There is a long-standing disconnect between the practical industrial engineering of computer systems, reliant on test-and-debug development and (at best) prose specifications, and theoretical research on more rigorous methods, which has rarely been applicable to mainstream engineering. Rigorous Engineering for Mainstream Systems has shown how important aspects (especially the highly subtle concurrency models, and also instruction-set architecture and language specifications) of two key industrial interfaces - architectures and programming languages - can be clarified, bringing mathematical models into the industrial specifications of ARM and other processor architectures, and of the C, C++, JavaScript, and WebAssembly programming languages. The entire industry depends on these specifications, and historically they have been very intractable. REMS has developed:

1) rigorous and usable engineering mathematics for the construction of robust and secure many-core computer systems, and

2) the intellectual and software tools to apply semantics and verification at the heart of mainstream system engineering, for these central specifications.

These are radical goals, which have to be approached via a great deal of detailed work in specific contexts - we cannot revolutionise the entire industry overnight - but, in multiple specific and important areas, they have been achieved.

The main industrial impact of the Cambridge REMS work to date has been on the understanding and specification of concurrency, in mainstream architectures and programming languages. These are two key interfaces: architecture specifications (definitions of the allowed behaviour of processors) let the industry decouple hardware and software development, and programming language specifications decouple high-level software development from machine-implementation concerns. However, both have historically been non-rigorous prose documents. This is problematic in many ways: the ambiguity of prose means they fail to serve as clear documentation; their lack of executability means one cannot test hardware or software against them, disconnecting them from mainstream test-and-debug development; and their informality makes formal verification impossible.

These problems have been especially acute for the concurrent multicore systems that have become ubiquitous in the last decade. The hardware and compiler optimisations that have made computers fast also introduce very subtle behaviour that programmers have to deal with, but exactly what the behaviour might be, and hence what programmers can rely on, has been very unclear. In 2008, major architectures, including Intel/AMD x86, ARMv7, and IBM POWER, all lacked clear definitions -- even the architects of these regarded their own prose specifications as inscrutable -- and languages such as C and C++ did not define concurrent behaviour at all, despite the fact that it was common in practice.

In a series of high-profile papers, working closely with ARM, IBM, and the RISC-V community, and combining mathematical and experimental techniques, we have established rigorous models for the concurrency architecture of x86, ARM, IBM POWER, and RISC-V. For ARM and RISC-V, these are now incorporated into the official standards; for x86 and IBM POWER, they are widely used de facto standards.

We have moreover shown how complete sequential instruction-set specifications of the ARMv8-A and RISC-V architectures, sufficient to boot Linux, can be made mathematically rigorous to support mechanised proof; our RISC-V instruction-set semantics has been adopted by RISC-V International as their official formal model.

At the programming-language level, working closely with the ISO C and C++, JavaScript, and WebAssembly standards bodies, in another series of papers and ISO working notes, we established rigorous mathematical models for the concurrency behaviour of each of these, all now incorporated into the corresponding standards. We have moreover made the sequential specification of WebAssembly mathematically precise, and are working with the ISO C and C++ committees to make the C and C++ memory object models precise; there is work towards a Technical Specification embodying this in the ISO WG14 C committee.



Details of impact
-----------------

Our research has directly impacted the mainstream computing industry by establishing rigorous semantics, in place of the previous unusable prose specifications, for architecture and programming-language concurrency and for their sequential behaviour.

The ARM architecture specifies the behaviour of the processors, designed by multiple vendors including Arm themselves, Apple, Qualcomm, Samsung, and others, used in mobile phones, tablets, and many other devices. The x86 architecture specifies the behaviour of the processors, designed by Intel, AMD, and Centaur, used in the majority of servers, desktops, and laptops. The IBM POWER architecture specifies the behaviour of IBM POWER servers. The RISC-V architecture is an emerging open-source architecture with considerable momentum across the industry.



0: Digital Security by Design
-----------------------------

REMS has played a substantial role in enabling the ISCF Wave 3 Digital Security by Design (DSbD) programme, a £70m Government + £114m Industry programme centered around a prototype Arm processor with radically improved security mechanisms, based on the Cambridge/SRI CHERI research project. In collaboration with the rest of the CHERI group, REMS has enabled both lightweight rigorous engineering of CHERI (including formal specification, testing, and test generation of CHERI instruction-set architectures) and formal statements and machine-checked proofs of key security properties. This has led to ongoing work in DSbD to scale such results up to the prototype Arm CHERI Morello architecture.

1: Rigorous (user-mode) Concurrency Architecture
------------------------------------------------

Our research on concurrency architecture, working closely with Arm, IBM, the RISC-V Foundation, and others, has led to (in REMS and previously):

1A: A rigorous model for ARMv8-A concurrency incorporated into Arm's architecture reference. Our work also contributed to a substantial simplification of their architecture in 2017/18 (their shift to a `multicopy-atomic' model).

1B: A rigorous model for RISC-V concurrency incorporated into the RISC-V specification, ratified by the RISC-V foundation

1C: A rigorous model for IBM POWER concurrency, widely used as the de facto standard.

1D: A rigorous model for x86 concurrency, widely used as the de facto standard.

These models are supported by extensive validation: discussion with the respective architects, experimental testing against the observable behaviour of hardware implementations (which has also found errata in multiple processor implementations from multiple vendors), and proofs of theorems about the models.

They provide an essential clear reference for hardware designers, compiler writers, tool builders, and programmers.



2: Rigorous instruction-set architecture
----------------------------------------

An architecture specification comprises three main parts: the ``user-mode'' concurrency model, as above, which is typically subtle but not large; the instruction-set architecture (ISA), describing the sequential semantics of instructions, which is conceptually simpler but typically involves a mass of detail; and the ``system-model'' concurrency archictecture, which adds even more subtlety. For instruction-set architecture:

2A: Working closely with Arm, we have established a publicly available ISA specification for the entire ARMv8.5-A architecture, of sufficient scope and detail to boot the Linux operating system, and accurate enough to pass the vast majority of the Arm-internal architecture validation suite. Our Sail infrastructure lets us make this available as definitions for multiple theorem provers (Coq, HOL4, Isabelle) to support formal verification.

2B: Working closely with the RISC-V community, we have likewise established an ISA specification for much of RISC-V. This is the only RISC-V ISA specification integrated with the RISC-V concurrency model. It has been chosen as the canonical reference by the RISC-V Foundation.


3: C/C++ Concurrency
--------------------

At the programming-language level, the behaviour of concurrent code is affected not just by the hardware optimisations that make architectural concurrency so involved, but also by compiler optimisations. Before 2011, the International Standards Organisation (ISO) language standards for C and C++ did not cover concurrent behaviour. Our (pre-REMS) research contributed substantially to a major exercise to address this in the 2011 revisions of those standards, and they and the later C++14, C++17, and C18 versions all contain prose renderings of a precise mathematical model we (working closely with other members of the C++ concurrency group) developed. By making these precise, we have also enabled others to engage with the problem, and they have made further improvements (e.g. P0668R2: Revising the C++ memory model, Hans-J. Boehm, Olivier Giroux, Viktor Vafeiadis, 2018-01-12, http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0668r2.html)


4: C/C++ Memory Object Model
----------------------------

Another subtlety that C and C++ have to deal with is the *memory object model*: the definition of what operations on pointers are allowed, and how that relates to compiler analysis and optimisation. Here too we are engaging with the ISO C and C++ committees to clarify the situation, and the C committee has set up a formal Study Group (chaired by Sewell) to do so.


5: JavaScript and WebAssembly Concurrency Semantics
---------------------------------------------------

JavaScript and WebAssembly are ubiquitous languages (old and new) implemented principally by web browsers to support local computation. Here we are working with ECMA TC39 (the JavaScript standards body), and with the WebAssembly working group to design and improve both specifications.

6: REMS concurrency reasoning
-----------------------------

REMS has developed a separation-logic style program logic for a sequential specification of a core subset of POSIX file systems and for modular reasoning about client programs using file systems.

SibylFS is a semantics and filesystem test oracle for POSIX filesystems, developed in a collaboration between semantics and system researchers led by Ridge at Leicester. SibylFS has been used to validate and identify bugs in existing production filesystems and in two verified filesystems (FSCQ and Flashix) from academia. The SibylFS infrastructure was used by Docker, following its acquisition of Unikernel Systems (a spin-out from Cambridge).

Here we focus on challenging problems of concurrent systems that include correctness properties, such as atomicity and termination, abstract specifications for concurrent modules and modular reasoning, and memory models. We have approached these problems from varying yet complementary directions, by developing new program logics models and type systems, algebraic reasoning, linearisability correctness conditions and proof techniques, reduction theories and automated verification tools.

7 Program logics, models, type systems
---------------------------------------

Here we focus on challenging problems of concurrent systems that include correctness properties, such as atomicity and termination, abstract specifications for concurrent modules and modular reasoning, and memory models. We have approached these problems from varying yet complementary directions, by developing new program logics models and type systems, algebraic reasoning, linearisability correctness conditions and proof techniques, reduction theories and automated verification tools.

We have proposed a logic, iCAP-TSO, for reasoning about programs running on a TSO memory model.

We have studied abstract local reasoning for concurrent libraries.

We have proposed a refinement of the mathematical technique of step-indexing.

We have studied abstract local reasoning for concurrent libraries.

We show how to verify four challenging concurrent fine-grained graph-manipulating algorithms, including graph copy, a speculatively parallel Dijkstra, graph marking and spanning tree.

We have developed TaDA, a separation logic for fine-grained concurrency which introduces atomic triples for specifying the observable atomicity exhibited by operations of concurrent modules. We have demonstrated the scalability of TaDA reasoning by studying concurrent maps.We have extended TaDA with well-founded termination reasoning.

A key difficulty in verifying shared-memory concurrent programs is reasoning compositionally about each thread in isolation. Existing verification techniques for fine-grained concurrency typically require reasoning about either the entire shared state or disjoint parts of the shared state, impeding compositionality. We have introduced the program logic CoLoSL, where each thread is verified with respect to its subjective view of the global shared state.
Exploitation Route Please see text above.
Sectors Aerospace, Defence and Marine,Digital/Communication/Information Technologies (including Software),Electronics,Security and Diplomacy

URL http://www.cl.cam.ac.uk/users/pes20/rems
 
Description (ELVER) - Engineering with Logic and Verification: Mathematically Rigorous Engineering for Safe and Secure Computer Systems
Amount € 2,473,844 (EUR)
Funding ID 789108 
Organisation European Commission 
Sector Public
Country European Union (EU)
Start 10/2018 
End 03/2024
 
Description Digital Security by Design (DSbD) Technology Platform Prototype
Amount £36,000,000 (GBP)
Funding ID 105694 
Organisation Innovate UK 
Sector Public
Country United Kingdom
Start 11/2019 
End 06/2024
 
Title Research data supporting "Dynamically Diagnosing Type Errors in Unsafe Code" 
Description Buildable source code of the libcrunch system and related codebases, within a Debian 8.5 (Jessie) distribution, packaged as a VirtualBox virtual machine image. 
Type Of Material Database/Collection of data 
Year Produced 2017 
Provided To Others? Yes  
 
Title Gillian 
Description Gillian is a language-independent platform for the development of compositional symbolic analysis tools. It supports three flavours of analysis: whole-program symbolic testing, full verification, and bi-abduction. It comes with fully parametric meta-theoretical results and a modular implementation, designed to minimise the instantiation effort required of the user. Gillian has been instantiated to JavaScript and C and evaluated on both real-world code and custom-made data-structure libraries, obtaining results that indicate that Gillian is robust enough to reason about real-world programming languages and programs. 
Type Of Technology Software 
Year Produced 2020 
Open Source License? Yes  
Impact Gillian has garnered interest from both academia (e.g., Cambridge), industry (e.g., Facebook, Amazon, Arm), and government (National Cyber Security Centre). It was used to verify JavaScript and C implementations of the AWS Message Header deserialisation procedure; this is the first full verification of industry-grade JavaScript code. The initial success of Gillian led to a Facebook gift to Prof. Gardner's group in the amount of £500K, with the purpose of further development. 
URL https://gillianplatform.github.io/
 
Title JaVerT 
Description JaVerT is a cutting-edge verification tool chain for JavaScript based on separation logic. JaVerT works by first translating JavaScript programs and specifications to a small intermediate goto language, called JSIL, and then performing verification on JSIL programs. JaVerT comprises: a compiler from JavaScript to JSIL; a semi-automatic verification tool, called JSILVerify, based on a JSIL program logic; and a logic translator from JavaScript assertions to JSIL assertions. 
Type Of Technology Software 
Year Produced 2016 
Open Source License? Yes  
Impact The JaVerT team is actively collaborating with Amazon and IBM. We are working with Julian Dolby, at IBM New York, on using the JaVerT infrastructure to build an automatic bug finding tool for JavaScript. The PhD student who started the project did an an internship at Amazon on designing a taint analysis for JavaScript using JSIL as a front-end for CBMC. We supervised four Master students (two MEng, once MSc, and one conversion MSc) in 2016. 
URL https://dl.acm.org/doi/abs/10.1145/3158138
 
Title JaVerT 2.0: Compositional Symbolic Execution for JavaScript 
Description JaVerT 2.0 introduces a novel, unified approach to the development of compositional symbolic execution tools, bridging the gap between classical symbolic execution and compositional program reasoning based on separation logic. It supports whole-program symbolic testing, verification, and, for the first time, automatic compositional testing based on bi-abduction. The meta-theory underpinning JaVerT 2.0 is developed modularly, streamlining the proofs and informing the implementation. Our explicit treatment of symbolic execution errors allows us to give meaningful feedback to the developer during whole-program symbolic testing and guides the inference of resource of the bi-abductive execution. JaVerT 2.0 has been evaluated on a number of JavaScript data-structure libraries, demonstrating: the scalability of our whole-program symbolic testing; an improvement over the state-of-the-art in JavaScript verification; and the feasibility of automatic compositional testing for JavaScript. 
Type Of Technology Software 
Year Produced 2019 
Open Source License? Yes  
Impact JaVerT 2.0 has won a Facebook Research award ($50K) and was part of a project to verify parts of the Amazon AWS Message Format management system, which then migrated to Gillian. JaVerT 2.0 was the starting point for JaVerT.click, an extension which has the ability to reason about event-driven programs that use the DOM and JavaScript constructs for asynchronous programming. 
URL https://dl.acm.org/doi/abs/10.1145/3290379
 
Title JaVerT.click 
Description We introduce a trusted infrastructure for symbolic analysis of modern event-driven Web applications. This infrastructure consists of reference implementations of the DOM Core Level 1 and UI Events, JavaScript Promises, and the JavaScript async/await APIs, all underpinned by a simple Core Event Semantics that is sufficiently expressive to describe the event models underlying all these APIs. Our reference implementations are trustworthy in that they follow the API respective standards line-by-line and they are thoroughly tested against the appropriate official test-suites, passing all the applicable tests. Using the Core Events Semantics and the reference implementations, we develop JaVerT.Click, a symbolic execution tool for JavaScript that, for the first time, supports reasoning about JavaScript programs that use some (possibly all) these APIs. 
Type Of Technology Software 
Year Produced 2020 
Open Source License? Yes  
Impact JaVerT.Click was used to perform comprehensive symbolic testing of the events module of Cash, a widely-used jQuery alternative, creating a symbolic test suite with 100% line coverage, establishing bounded correctness of several essential properties of the module, and discovering two subtle, previously unknown bugs. As part of a project protected under an NDA, it was also used to analyse parts of the Amazon Prime Video codebase. 
URL https://drops.dagstuhl.de/opus/volltexte/2020/13202/