Multiprocessors: From Microarchitecture to Semantic Theory

Lead Research Organisation: University of Cambridge
Department Name: Computer Science and Technology

Abstract

Modern computers have become predominantly multicore, providing programmers with many simultaneously-executing threads of execution sharing the system memory. Concurrent programming is challenging, and this trend in hardware should have been an ideal application target for previous academic work on software verification. Concurrent programming has been extensively studied, with the development of areas such as model checking, process calculi, separation logic, and rely-guarantee reasoning. Unfortunately, realistic architectures violate a key assumption of most such work, that all threads have a consistent view of memory, an assumption often referred to as Sequential Consistency. Instead, programmers are actually faced with a Relaxed Memory Model , with only approximate consistency, making programming and verification significantly more challenging.Relaxed memory models arise, ultimately, from hardware (and compiler) implementation features designed for performance. Unfortunately, there is no clear understanding of how the microarchitectural choices made in hardware implementations lead to programmer-observable consequences. This is an instance of a more general problem, that modern hardware is too complex to understand without formal proofs, as witnessed by a succession of errata in deployed processors. Hardware verification is itself a field that has received a great deal of study. However, the field has still not reached the point of verifying global semantic properties, such as relaxed memory models, that emerge from the interaction of multiple disparate subsystems, each of which suffers from imprecise specifications.I propose to create a general theory of the programmer-observable behaviour which emerges from the microarchitectural choices and optimizations made in multiprocessor implementations. I will develop abstract mathematical models of the implementations, and explain with proof the observable consequences of those implementations. With specifications derived from that theory, I will go on to study the hardware verification problem, establishing techniques and methods for verifying global programmer-visible properties. Throughout, I will address not just safety properties, but also progress and liveness properties that sophisticated programmers rely on.Such a general theory is critical for a better understanding of multiprocessor semantics, by both programmers and hardware implementers, and for enabling these different communities to communicate their knowledge to each other. It is also an essential prerequisite for sound theories of concurrent programming. An improved understanding and communication of these issues is imperative for reliable multiprocessor usage.

Planned Impact

The beneficiaries of this research in the larger world include practitioners of concurrent programming, and of computer architecture, and of the verification of both, within both academia and industry. This will in the medium term benefit the users of shared memory multiprocessors, who will find their systems becoming more reliable. Such users constitute a huge and still growing section of the general public, since desktops, mobile devices, and servers are steadily becoming multiprocessor. As an indicative point for the time required for benefits to become available to the general public, hardware designs take on the order of five years to become publicly available. The academic beneficiaries are listed in a separate section, and I note that there already exists very strong links between the academic and industrial community in each of these fields, leading to an easy exchange of knowledge within each field. I note some additional points relevant to industrial practitioners here. Modern hardware architectures are extremely complex, and also extremely difficult to change if found not to be correct. The major processor vendors, including Intel, AMD, IBM, and ARM, have already deployed a huge number of multiprocessors, in high-performance systems, in commodity consumer computers, and in the embedded market, with even mobile phones starting to become multiprocessor. This trend will intensify as low-power and high-performance devices enjoy greater demand, and it becomes critical to remove bugs and inconsistencies at an early stage, and give precise guarantees about multiprocessors to the programmers that develop systems above them. Mistakes can be very expensive, in terms of both money and reputation. Key improvements can be made by using mathematical theory at early stages of the design process, for example, by helping cache-protocol designers understand the consequences of the choices they make on programmer-observable behaviour. I will engage with the hardware-design community, particularly with my project partners in IBM, as also the large expert community in the Austin industrial area. I also have a nascent relationships with ARM in Cambridge, which I hope to build upon. The real-world expertise and experience provided by my industrial contacts will complement my expertise in semantics and mechanized mathematics. The proposed work will provide methods and techniques of verification, and example formal proofs of concrete implementations as case studies. I will make these publicly available in the scientific literature, and via the web.
 
Description Precise description of the programmer model on IBM POWER and ARM Multiprocessors, important for correct programming;

Precise description of concurrent programming on C and C++ (ISO C11/C++11);

Correctness techniques for compiling concurrent code from C/C++ to hardware;

Ways of hardware design to take advantage of programmer-oriented models
Exploitation Route The Linux Weekly News article targeted kernel programmers, who are beginning to use the ppcmem and cppmem tool to understand hardware behaviours;

Architects in IBM and ARM informally discuss future architecture behaviour parameters (eg IBM's transaction extensions, under a CDA, now lifted);
Academic papers detailing techniques and proofs;

Contributed to ISO C and C++ standards (C11 and C++11), making the standards more robust and reliable;

Released web-based tools to explore subtle memory model behaviour (ppcmem tool was featured in a Linux Weekly News article targeting kernel programmers);

Gave an invited tutorial at POPL'13, a premier programming language conference
Sectors Digital/Communication/Information Technologies (including Software)

 
Description Material has fed into the development of new ISO C/C++ concurrency standards (C11/C++11/C++14). Findings have been used internally in industrial organisations to make precise the "relaxed concurrency model" of hardware processors; and of new "transactional memory" extensions of IBM Power8 and ARMv8 architectures Developed tools (PPCMEM/CPPMEM/ARMMEM) have been used by compiler developers and OS developers within the open-source community (GCC,CLang, Linux)
First Year Of Impact 2012
Sector Digital/Communication/Information Technologies (including Software)
Impact Types Economic

 
Description C3: Scalable & Verified Shared Memory via Consistency-directed Cache Coherence
Amount £668,897 (GBP)
Funding ID EP/M027317/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 11/2015 
End 10/2018
 
Description SICSA Early Career Industrial Fellowship
Amount £20,000 (GBP)
Organisation SICSA Scottish Informatics and Computer Science Alliance 
Sector Academic/University
Country United Kingdom
Start 04/2014 
End 04/2016
 
Description Standard Research
Amount £850,000 (GBP)
Funding ID EP/P020631/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 07/2017 
End 06/2020
 
Description Shared Memory Concurrency in the Real World: Working with Relaxed Memory Consistency 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Type Of Presentation keynote/invited speaker
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact A tutorial on shared-memory concurrency, given at POPL'2013 at Rome.

POPL is the premier programming language conference, organized by ACM. They hold a "TutorialFest" with invited tutorials on research frontiers, for the benefit of graduate and post-graduate researchers, and other interested people. I was invited to give a tutorial on Shared Memory Concurrency, with an audience of about 30.

After my talk, I had several PhD students contact me to use my tools, and cite my and other papers I mentioned in their dissertations.
Year(s) Of Engagement Activity 2013