Multiprocessors: From Microarchitecture to Semantic Theory

Lead Research Organisation: University of Cambridge

Department Name: Computer Science and Technology

Abstract

Modern computers have become predominantly multicore, providing programmers with many simultaneously-executing threads of execution sharing the system memory. Concurrent programming is challenging, and this trend in hardware should have been an ideal application target for previous academic work on software verification. Concurrent programming has been extensively studied, with the development of areas such as model checking, process calculi, separation logic, and rely-guarantee reasoning. Unfortunately, realistic architectures violate a key assumption of most such work, that all threads have a consistent view of memory, an assumption often referred to as Sequential Consistency. Instead, programmers are actually faced with a Relaxed Memory Model , with only approximate consistency, making programming and verification significantly more challenging.Relaxed memory models arise, ultimately, from hardware (and compiler) implementation features designed for performance. Unfortunately, there is no clear understanding of how the microarchitectural choices made in hardware implementations lead to programmer-observable consequences. This is an instance of a more general problem, that modern hardware is too complex to understand without formal proofs, as witnessed by a succession of errata in deployed processors. Hardware verification is itself a field that has received a great deal of study. However, the field has still not reached the point of verifying global semantic properties, such as relaxed memory models, that emerge from the interaction of multiple disparate subsystems, each of which suffers from imprecise specifications.I propose to create a general theory of the programmer-observable behaviour which emerges from the microarchitectural choices and optimizations made in multiprocessor implementations. I will develop abstract mathematical models of the implementations, and explain with proof the observable consequences of those implementations. With specifications derived from that theory, I will go on to study the hardware verification problem, establishing techniques and methods for verifying global programmer-visible properties. Throughout, I will address not just safety properties, but also progress and liveness properties that sophisticated programmers rely on.Such a general theory is critical for a better understanding of multiprocessor semantics, by both programmers and hardware implementers, and for enabling these different communities to communicate their knowledge to each other. It is also an essential prerequisite for sound theories of concurrent programming. An improved understanding and communication of these issues is imperative for reliable multiprocessor usage.

Planned Impact

The beneficiaries of this research in the larger world include practitioners of concurrent programming, and of computer architecture, and of the verification of both, within both academia and industry. This will in the medium term benefit the users of shared memory multiprocessors, who will find their systems becoming more reliable. Such users constitute a huge and still growing section of the general public, since desktops, mobile devices, and servers are steadily becoming multiprocessor. As an indicative point for the time required for benefits to become available to the general public, hardware designs take on the order of five years to become publicly available. The academic beneficiaries are listed in a separate section, and I note that there already exists very strong links between the academic and industrial community in each of these fields, leading to an easy exchange of knowledge within each field. I note some additional points relevant to industrial practitioners here. Modern hardware architectures are extremely complex, and also extremely difficult to change if found not to be correct. The major processor vendors, including Intel, AMD, IBM, and ARM, have already deployed a huge number of multiprocessors, in high-performance systems, in commodity consumer computers, and in the embedded market, with even mobile phones starting to become multiprocessor. This trend will intensify as low-power and high-performance devices enjoy greater demand, and it becomes critical to remove bugs and inconsistencies at an early stage, and give precise guarantees about multiprocessors to the programmers that develop systems above them. Mistakes can be very expensive, in terms of both money and reputation. Key improvements can be made by using mathematical theory at early stages of the design process, for example, by helping cache-protocol designers understand the consequences of the choices they make on programmer-observable behaviour. I will engage with the hardware-design community, particularly with my project partners in IBM, as also the large expert community in the Austin industrial area. I also have a nascent relationships with ARM in Cambridge, which I hope to build upon. The real-world expertise and experience provided by my industrial contacts will complement my expertise in semantics and mechanized mathematics. The proposed work will provide methods and techniques of verification, and example formal proofs of concrete implementations as case studies. I will make these publicly available in the scientific literature, and via the web.

Funded Value:

£269,820

Funded Period:

Jun 10 - Jul 13

Funder:

EPSRC

Project Status:

Closed

Project Category:

Fellowship

Project Reference:

EP/H027351/1

Principal Investigator:

Susmit Sarkar

Research Subject:

Info. & commun. Technol. (100%)

Research Topic:

Fundamentals of Computing (100%)

Organisations

People	ORCID iD
Susmit Sarkar (Principal Investigator)	http://orcid.org/0000-0002-4259-9213

Publications

Author Name

Title Publication Date Published

|< < 1 2 > >|

10 25 50

Sarkar S (2011) Understanding POWER multiprocessors in ACM SIGPLAN Notices

Sarkar S (2012) Synchronising C/C++ and POWER

Pulte C (2017) Simplifying ARM concurrency: multicopy-atomic axiomatic and operational models for ARMv8 in Proceedings of the ACM on Programming Languages

Mcpherson A (2015) Fence Placement for Legacy Data-Race-Free Programs via Synchronization Read Detection in ACM Transactions on Architecture and Code Optimization

McPherson A (2015) Fence placement for legacy data-race-free programs via synchronization read detection

Mark Batty (Co-Author) (2011) Mathematizing C++ concurrency

Mador-Haim S (2012) Computer Aided Verification - 24th International Conference, CAV 2012, Berkeley, CA, USA, July 7-13, 2012 Proceedings

Gray K (2015) An integrated concurrency and core-ISA architectural envelope definition, and test oracle, for IBM POWER multiprocessors

Flur S (2016) Modelling the ARMv8 architecture, operationally: concurrency and ISA

Flur S (2017) Mixed-size concurrency: ARM, POWER, C/C++11, and SC

Key Findings
Impact Summary
Further Funding
Engagement Activities


Description	Precise description of the programmer model on IBM POWER and ARM Multiprocessors, important for correct programming; Precise description of concurrent programming on C and C++ (ISO C11/C++11); Correctness techniques for compiling concurrent code from C/C++ to hardware; Ways of hardware design to take advantage of programmer-oriented models
Exploitation Route	The Linux Weekly News article targeted kernel programmers, who are beginning to use the ppcmem and cppmem tool to understand hardware behaviours; Architects in IBM and ARM informally discuss future architecture behaviour parameters (eg IBM's transaction extensions, under a CDA, now lifted); Academic papers detailing techniques and proofs; Contributed to ISO C and C++ standards (C11 and C++11), making the standards more robust and reliable; Released web-based tools to explore subtle memory model behaviour (ppcmem tool was featured in a Linux Weekly News article targeting kernel programmers); Gave an invited tutorial at POPL'13, a premier programming language conference
Sectors	Digital/Communication/Information Technologies (including Software)


Description	Material has fed into the development of new ISO C/C++ concurrency standards (C11/C++11/C++14). Findings have been used internally in industrial organisations to make precise the "relaxed concurrency model" of hardware processors; and of new "transactional memory" extensions of IBM Power8 and ARMv8 architectures Developed tools (PPCMEM/CPPMEM/ARMMEM) have been used by compiler developers and OS developers within the open-source community (GCC,CLang, Linux)
First Year Of Impact	2012
Sector	Digital/Communication/Information Technologies (including Software)
Impact Types	Economic


Description	C3: Scalable & Verified Shared Memory via Consistency-directed Cache Coherence
Amount	£668,897 (GBP)
Funding ID	EP/M027317/1
Organisation	Engineering and Physical Sciences Research Council (EPSRC)
Sector	Public
Country	United Kingdom
Start	11/2015
End	10/2018


Description	SICSA Early Career Industrial Fellowship
Amount	£20,000 (GBP)
Organisation	SICSA Scottish Informatics and Computer Science Alliance
Sector	Academic/University
Country	United Kingdom
Start	04/2014
End	04/2016


Description	Standard Research
Amount	£850,000 (GBP)
Funding ID	EP/P020631/1
Organisation	Engineering and Physical Sciences Research Council (EPSRC)
Sector	Public
Country	United Kingdom
Start	07/2017
End	06/2020


Description	Shared Memory Concurrency in the Real World: Working with Relaxed Memory Consistency
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Type Of Presentation	keynote/invited speaker
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	A tutorial on shared-memory concurrency, given at POPL'2013 at Rome. POPL is the premier programming language conference, organized by ACM. They hold a "TutorialFest" with invited tutorials on research frontiers, for the benefit of graduate and post-graduate researchers, and other interested people. I was invited to give a tutorial on Shared Memory Concurrency, with an audience of about 30. After my talk, I had several PhD students contact me to use my tools, and cite my and other papers I mentioned in their dissertations.
Year(s) Of Engagement Activity	2013

Abstract

Planned Impact

Organisations

People

ORCID iD

Publications