Verifiably correct high-performance concurrency libraries for multi-core computing systems

Lead Research Organisation: Brunel University
Department Name: Computer Science

Abstract

The benefits of fast computer systems are clear - improving performance allows more sophisticated applications to be developed. Speed-up via higher processor clock speeds, however, reached a physical upper limit in the early 2000s, with power-dissipation becoming a major issue. Hardware designers have therefore switched to multicore processors with fixed clock speeds, where speed-up is achieved by including several cores in a single processor. Multicore processors are now pervasive in all aspects of computing, from cluster servers and desktops to low-power mobile devices.

A multicore processor efficiently executes concurrent programs comprising multiple threads by allowing several computations to take place simultaneously in the different cores. These threads communicate via synchronised access and modification of shared resources stored in shared memory. While some problems (coined embarrassingly parallel problems) are naturally able to take advantage of multicore computing, the majority require clever management of thread synchronisation that is difficult to achieve. Moreover, modern architectures have forced programmers to understand intricate details of the hardware/software interface in addition to concurrency issues within the programs they develop.

Within a multicore processor, each processing core has access to a local buffer that acts as a temporary cache - any writes stored in a local buffer are not visible to other cores until the buffer is flushed. For efficiency reasons, multicore architectures implement so-called relaxed-memory models, where read and write events within a single thread may take effect in shared memory out-of-order, leading to behaviours that may not be expected by a programmer. Restoring correctness requires manual introduction of "fence" instructions in a program's code, which is a non-trivial task - under-fencing leads to incorrect behaviours, while overfencing leads to inefficient programs. Therefore, as Shavit states, "The advent of multicore processors as the standard computing platform will force major changes in software design." There is a large international effort in both academia and industry to make better use of multicore processing, ranging from new foundational and theoretical underpinnings to the development of novel abstractions, tool support, engineering paradigms, etc.

In this project, we aim to simplify system development via practical solutions that are highly performant (to increase efficiency), verifiably correct (to ensure dependability) and cope with relaxed-memory models (as used by modern hardware). In particular, we develop efficient "concurrent objects" that provide abstractions from the low-level hardware interface and manage thread synchronisation on behalf of a programmer. Concurrent objects are to ultimately become part of a programming language library - some languages e.g., Java already offer basic concurrent objects as part of their standard library - hence are highly applicable.

The objects we deliver will take advantage of relaxed memory models. Here, it is well known that correctness conditions that are too strict (e.g., linearizability) are themselves becoming a barrier to efficiency.
We therefore begin by developing new theoretical foundations in the form of relaxed correctness conditions that match relaxed memory models. A hierachy of different conditions will be developed, enabling one to easily determine the relative strengths of each condition. These conditions will be linked to a contextual notion of refinement to provide a basis for substitutability in client programs. This provides a basis for more efficient, verifiably correct, concurrent objects for relaxed memory models. We will focus on developing concurrent data structures (e.g., stacks, queues, deques) for the TSO memory model. Their verification will be mechanised in the KIV theorem prover to eliminate human error in verification, which in turn improves dependability.

Planned Impact

This work will have impact on three key areas.

Academic. This work covers the areas of concurrency theory, programming languages, algorithm development and verification, and hence, researchers from these areas will be the primary beneficiaries. Academically, we will produce scientific innovations that present new ways of thinking about correctness of a concurrent object, develop new concurrent object designs as proofs of concept, and new verifications of these objects. Correctness notions for concurrent objects are adapted from distributed computing and databases, and hence our insights gained in the multicore context can be fed back into both of these areas. On a wider scale, this project will enable us to deepen our understanding of concurrent systems, by learning from the abstractions and logics we use to verify the concurrent objects we develop.

Industrial. The project covers foundational aspects of concurrent systems, and hence, is industrially relevant to many different types of companies.
- Hardware manufacturers, e.g., IBM, Intel, AMD, and ARM, who require better-performing software to justify introduction of additional cores in the processors they produce. Without software-side improvements, end-users will see minimal improvements from processors with more cores.
- Programming language developers, e.g., Java (Oracle), C++, Scala, Python, who use predefined libraries to provide high-end functionality to programmers in a layered re-usable manner.
- Application developers, e.g., Valve Software (video gaming), Codeplay Software (SoC tools), etc., who require reliable high-performance libraries to take advantage of the available multi-core hardware.
- Formal verification companies, e.g., http://formalmethods.wikia.com/wiki/Companies, who will benefit from the techniques we use to verify novel concurrent data structures for relaxed memory. Of particular interest will be the mechanisations we develop in KIV.
- Large-scale developers, e.g., Google, Facebook and Amazon, who are interested in correctness conditions such as causal and eventual consistency in the databases they use. There is a direct link between the correctness conditions studied in WP1 and such notions of consistency.

Societal. There is no doubt that faster computing is the cornerstone of improving modern life, with an endless list of examples forming a wide range of real-world scenarios - improving personal interactions with end-user devices, connecting ever more devices to both centralised and distributed servers, processing vast quantities of data for commercial and scientific use, enhancing security for commerce, etc. There is an increasing dependence on computerised systems that control an ever-larger proportion of security-, mission- and safety-critical systems, e.g., railways, power distribution systems, spacecraft, etc, all of which are inherently concurrent and, in addition, reliability is of utmost importance.

Publications

10 25 50
 
Description Our primary set of findings is as follows. (1) Correctness conditions for concurrent objects are intrinsically difficult to prove, and weakening these conditions does not necessarily make them simpler to verify. We have established decidability and complexity results for different variations of a weak condition known as quiescent consistency. (2) Correctness conditions weaker than linearizability (including sequential consistency) do not guarantee contextual trace refinement (trace refinement of client programs). However, the correctness condition linearizability together with a progress condition (minimal progress) does guarantee contextual trace refinement. This potentially forms a basis for understanding relaxed correctness under relaxed memory by relaxing the (totally ordered) trace models to partial orders. (3) The potential for impact from transactional memory as a concurrency abstraction seems greater than concurrent objects alone, especially in light of new research that describes how concurrent objects can be constructed using transactional memory. Transactional objects are inherently more generic than concurrent objects - a language/compiler would only need to implement transactional memory algorithms; the objects implemented by the transactional memory would then be available for free. (4) It is feasible to verify transactional memory using simulation against the TMS2 specification; we have demonstrated this via a series of verifications: (a) the verification of an complex pessimistic algorithm, (b) a method that links transactional correctness to linearizability (which includes a weak-memory verification), and (c) a modular method for verifying hybrid transactional memory.
Exploitation Route - Those interested in verification could use our simulation-based Isabelle proofs to verify other types of concurrent algorithms.
- Those interested in implementation could implement our verified algorithms; others could in turn use them in their application programs.
- Transactional memory does provide an easy-to-use concurrency abstraction; it is worth exploring them as an alternative to mutex locks in concurrent software, especially in the presence of weak memory.
Sectors Aerospace, Defence and Marine,Digital/Communication/Information Technologies (including Software),Electronics,Energy,Financial Services, and Management Consultancy,Manufacturing, including Industrial Biotechology,Security and Diplomacy,Transport

 
Description With the continued importance of multi-core systems, work on verifiably correct concurrency abstractions continues to play an important role in current computer science research. Our work on transactional memory (TM) and weak-memory verification has gained the most traction; initial investigations in this area has attracted important industrial collaborators - ARM and Mozilla. - ARM are interested in our hybrid TM implementations, which would motivate HTM implementations within ARM processors. HTM is currently available commercially within Intel's Haswell and IBM's BlueGene architectures. - Mozilla are interested in using TM within their parallel Servo browser engine. We have conducted some preliminary experiments using a transactional mutex lock (https://github.com/asajeffrey/tra-mut-loc/blob/master/src/lib.rs) in Rust. We are currently developing verification methods for the implementation in Rust's LLVM-based memory model.
First Year Of Impact 2017
Sector Digital/Communication/Information Technologies (including Software)
Impact Types Economic

 
Description Verifiably Correct Transactional Memory
Amount £397,680 (GBP)
Funding ID EP/R032556/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Academic/University
Country United Kingdom
Start 09/2018 
End 09/2021
 
Description Verifiably correct concurrency abstractions
Amount £14,370 (GBP)
Funding ID EP/R019045/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Academic/University
Country United Kingdom
Start 12/2017 
End 07/2018
 
Description Verifiably correct concurrency abstractions
Amount £9,040 (GBP)
Funding ID EP/R019045/2 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Academic/University
Country United Kingdom
Start 11/2018 
End 03/2020
 
Description ARM 
Organisation ARM Holdings
Country United Kingdom 
Sector Private 
PI Contribution Dongol together with Prof John Derrick and Dr Simon Doherty travelled to ARM (Cambridge) to discuss potential for collaborative work with ARM. Dongol gave a presentation (via a lunch-and-learn session), followed by a discussion about the most imminent problems for verification at ARM.
Collaborator Contribution ARM have provided some feedback on our work and discussed the methods that would have most impact.
Impact Dr Stephan Diestelhorst at ARM is an industrial collaborator on my new EPSRC grant (under review).
Start Year 2016
 
Description Mozilla 
Organisation Mozilla Research
Country Global 
Sector Charity/Non Profit 
PI Contribution Jointly investigated transactional memory implementations for Servo using the Rust programming language. Dongol travelled to Chicago to meet with Dr Alan Jeffrey to investigate the use of transactional memory within Mozilla's next generation Servo browser Engine (https://servo.org/). For the issues at hand, it was decided that a variation of the Transactional Mutex Lock (TML) would be most suitable.
Collaborator Contribution Alan has implemented a modified version of the TML in Rust (https://github.com/asajeffrey/tra-mut-loc). For efficiency, the implementation uses C11 relaxed-memory bindings. Early investigations show that using TML in place of global locks achieves a 10x performance increase across realistic benchmarks.
Impact We are developing a logic for C11 to verify correctness of the modified TML using the theorem prover Isabelle. Mozilla Research (Dr Alan Jeffrey) is an industrial partner for my new ESPRC grant (Verifiably Correct Transactional Memory for Reliable High Performance Concurrency) which is currently under review. Here, we plan to extend our initial investigations outlined above.
Start Year 2016
 
Description Presentation at ARM 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Industry/Business
Results and Impact Around 15 ARM personnel in Cambridge and 10 additional personnel in Texas attended my lunch-and-learn session on verifying concurrent objects and transactional memory (including hybrid transactional memory). This has led to further discussions and collaboration on my new research grant.
Year(s) Of Engagement Activity 2016