C3: Scalable & Verified Shared Memory via Consistency-directed Cache Coherence
Lead Research Organisation:
University of Edinburgh
Department Name: Sch of Informatics
Abstract
Shared-memory multi-core processors are ubiquitous, but programming them
remains challenging. The programming model exposed by such multi-core
processors depends crucially on a "memory consistency model" (MCM), a contract
between the hardware and the programmer, which essentially specifies what
value a read can return. On the hardware side, one key mechanism to implement
the memory consistency model is the "cache-coherence protocol" (CCP), which
essentially communicates memory operations between processors. However, the
connection between the CCP and the MCM remains unclear. This is especially
true for modern CCPs and MCMs, in which CCP design has been divorced from the
requirements of the MCM. We argue that this has negatively impacted the
scalability and the verifiability of CCPs.
On the scalability front, there are serious question marks about sustaining
cache coherence as the number of cores continue to scale. On the verification
front, the application of existing verification techniques, which do not
verify the CCP against the MCM, are arguably broken.
In the C3 proposal, we propose a family of CCPs that are "aware" of, and
verified against the MCM. Our approach is motivated by the fact that both
hardware and programming languages are converging to various relaxed MCMs for
performance oriented reasons. We use such relaxed MCMs as inspiration to
research CCPs that can take advantage of them. Specifically, we will research
"lazy" CCPs where memory operations are batched, and the cost of communicating
a memory operation can be amortised. We will also, for the first time,
formally verify the relationship between the hardware CCPs and the
programmer-oriented MCM they provide. We will investigate rigorously the
gains to be had from such lazy CCPs. We will do this by creating a multi-core
silicon prototype of our proposed CCP, leveraging our experience in the design
of industrial-strength micro-architectures and their implementations.
remains challenging. The programming model exposed by such multi-core
processors depends crucially on a "memory consistency model" (MCM), a contract
between the hardware and the programmer, which essentially specifies what
value a read can return. On the hardware side, one key mechanism to implement
the memory consistency model is the "cache-coherence protocol" (CCP), which
essentially communicates memory operations between processors. However, the
connection between the CCP and the MCM remains unclear. This is especially
true for modern CCPs and MCMs, in which CCP design has been divorced from the
requirements of the MCM. We argue that this has negatively impacted the
scalability and the verifiability of CCPs.
On the scalability front, there are serious question marks about sustaining
cache coherence as the number of cores continue to scale. On the verification
front, the application of existing verification techniques, which do not
verify the CCP against the MCM, are arguably broken.
In the C3 proposal, we propose a family of CCPs that are "aware" of, and
verified against the MCM. Our approach is motivated by the fact that both
hardware and programming languages are converging to various relaxed MCMs for
performance oriented reasons. We use such relaxed MCMs as inspiration to
research CCPs that can take advantage of them. Specifically, we will research
"lazy" CCPs where memory operations are batched, and the cost of communicating
a memory operation can be amortised. We will also, for the first time,
formally verify the relationship between the hardware CCPs and the
programmer-oriented MCM they provide. We will investigate rigorously the
gains to be had from such lazy CCPs. We will do this by creating a multi-core
silicon prototype of our proposed CCP, leveraging our experience in the design
of industrial-strength micro-architectures and their implementations.
Planned Impact
This project will potentially have major impact, primarily on processor
architecture companies, and thereby on UK economy and indeed society.
The potential industrial impact is massive. UK and international companies
such as Intel, ARM, Synopsys and IBM are all directly engaged in scaling up
heterogeneous many-core shared memory systems. Our project is potentially a
key enabler in this direction, and this is reflected in the various supporting
letters from our industrial partners. We have several key industrial contacts,
and a track record of transferring technologies from academia to the
commercial sphere. Also note that verification is a key concern in all
semiconductor companies, and cache coherence is recognised to be a
particularly hard target. Our techniques, if successful, will open a new and
potentially far cheaper avenue of verification of the hard problems in this
area.
Processor companies such as the above are by themselves a significant sector
of the UK economy. However, since our techniques take full account of the
needs of the programmer, and in fact are designed to bolster such
important-to-programmer models as C11/C++11 and OpenCL, the potential
beneficiaries extend to compiler companies, and indeed concurrent and system
software companies. These mechanisms can potentially transmit the gains we
obtain, in verifiability, in performance, and in reliability, to all of
society.
architecture companies, and thereby on UK economy and indeed society.
The potential industrial impact is massive. UK and international companies
such as Intel, ARM, Synopsys and IBM are all directly engaged in scaling up
heterogeneous many-core shared memory systems. Our project is potentially a
key enabler in this direction, and this is reflected in the various supporting
letters from our industrial partners. We have several key industrial contacts,
and a track record of transferring technologies from academia to the
commercial sphere. Also note that verification is a key concern in all
semiconductor companies, and cache coherence is recognised to be a
particularly hard target. Our techniques, if successful, will open a new and
potentially far cheaper avenue of verification of the hard problems in this
area.
Processor companies such as the above are by themselves a significant sector
of the UK economy. However, since our techniques take full account of the
needs of the programmer, and in fact are designed to bolster such
important-to-programmer models as C11/C++11 and OpenCL, the potential
beneficiaries extend to compiler companies, and indeed concurrent and system
software companies. These mechanisms can potentially transmit the gains we
obtain, in verifiability, in performance, and in reliability, to all of
society.
Publications
Castro D
(2018)
Automatically deriving cost models for structured parallel processes using hylomorphisms
in Future Generation Computer Systems
Castro D
(2016)
Farms, pipes, streams and reforestation: reasoning about structured parallel processes using types and hylomorphisms
in ACM SIGPLAN Notices
Castro D
(2018)
Automatically deriving cost models for structured parallel processes using hylomorphisms
in Future Generation Computer Systems
Dublish S
(2017)
Evaluating the GPU Memory Hierarchy for General-Purpose Application
Description | The cache coherence protocol is a mechanism to ensure that values updated in one processor core are visible to others. The consistency model is the specification (exposed to programmers) that defines how a memory system should behave. We have discovered that, if the coherence protocol is made aware and designed against the specifications of the memory consistency model, then the protocol can be made to scale to larger number of processors. We have demonstrated this by developing a scalable coherence protocol that is designed targeting the x86TSO memory consistency model (supported by today's Intel, AMD and Sparc processors). We have also developed a novel technique to verify the coherence protocol against the memory consistency model. We have built on this result to produce a tool that can automatically synthesize correct-by-construction cache coherence protocols, solving a decades-old problem. In addition to these, we have shown how these techniques can also apply to the network of computers forming part of a data centre to keep these computers coherent at the software level. |
Exploitation Route | The protocols and verification techniques developed as part of the project makes multicore processors more scalable and reliable. Processor manufacturers can use the protocols developed to develop more scalable multicore processors. The verification techniques developed can be used to ensure that the coherence protocols to not suffer from costly bugs. Moreover, this work has started a new line of enquiry on correct-by-construction protocols. |
Sectors | Digital/Communication/Information Technologies (including Software) Electronics |
Description | 1) Cache coherence protocols are hardware communication mechanisms that are notoriously hard to get right: Samsung had to disable coherence in the popular Galaxy mobile phones because of a bug in the coherence protocol that were not caught. As part of the C3 project, we developed a tool to discover such hard to find bugs. The tool created broad impact by discovering real bugs in the most popular simulator that is used by industry and academia. 2) Because of the work we did as part of the C3 project we were able to attract one of the world leading experts in cache coherence protocols, Professor Daniel Sorin from Duke University, to the University of Edinburgh for a sabbatical. The sabbatical is already providing high impact. In collaboration with Professor Sorin, our team has developed a tool to automatically produce correct-by-construction cache coherence protocols. People from two processor manufacturing companies, AMD and Huawei, have evinced interest in this tool. The work which appeared at ISCA 2018. Importantly, the work has been designated as an honourable mention by the computer architecture community. Every year the computer architecture community selects "significant and insightful papers that have the potential to influence the work of computer architects for years to come". This paper has been selected for this honour. 3) Huawei have since funded us for a follow-up project for producing correct-by-construction coherence protocols. Out tool is also starting to used by processor startup called Incore seminconductors for producing companies for producing coherence protocols for fault-tolerant processor used in avionics. 4) Our findings have been distilled into a book that has been published by Morgan and Claypool publishers (A Primer on Memory Consistency and Cache Coherence, Second Edition). |
First Year Of Impact | 2019 |
Sector | Digital/Communication/Information Technologies (including Software),Electronics |
Impact Types | Societal Economic |
Description | Memory of RISC-V Memory model task group |
Geographic Reach | Multiple continents/international |
Policy Influence Type | Membership of a guideline committee |
URL | https://riscv.org/2017/04/risc-v-memory-consistency-model/ |
Description | Primer Second Edition |
Geographic Reach | Multiple continents/international |
Policy Influence Type | Influenced training of practitioners or researchers |
URL | https://www.morganclaypool.com/doi/abs/10.2200/S00962ED2V01Y201910CAC049 |
Description | Use by Processor Startup for safety critical processors |
Geographic Reach | Multiple continents/international |
Policy Influence Type | Influenced training of practitioners or researchers |
URL | https://www.youtube.com/watch?v=zbTEFfSU5ro&t=1125s |
Description | ARM Studentship |
Amount | £63,584 (GBP) |
Organisation | Arm Limited |
Sector | Private |
Country | United Kingdom |
Start | 08/2016 |
End | 08/2020 |
Description | C6: Correct-by-Construction Heterogeneous Coherence |
Amount | £494,698 (GBP) |
Funding ID | EP/V028154/1 |
Organisation | Engineering and Physical Sciences Research Council (EPSRC) |
Sector | Public |
Country | United Kingdom |
Start | 11/2021 |
End | 11/2024 |
Description | Dijkstra's Pipe: Timing-Secure Processors by Design |
Amount | £535,238 (GBP) |
Funding ID | EP/V038699/1 |
Organisation | Engineering and Physical Sciences Research Council (EPSRC) |
Sector | Public |
Country | United Kingdom |
Start | 11/2021 |
End | 11/2024 |
Description | HeteroGen: Automatically Generating Heterogeneous Cache Coherence Protocols |
Amount | £175,000 (GBP) |
Organisation | Huawei Technologies |
Sector | Private |
Country | China |
Start | 04/2019 |
End | 10/2022 |
Description | SICSA Distinguished visiting fellow |
Amount | £6,080 (GBP) |
Organisation | SICSA Scottish Informatics and Computer Science Alliance |
Sector | Academic/University |
Country | United Kingdom |
Start | 07/2017 |
End | 08/2017 |
Description | Standard Research |
Amount | £850,000 (GBP) |
Funding ID | EP/P020631/1 |
Organisation | Engineering and Physical Sciences Research Council (EPSRC) |
Sector | Public |
Country | United Kingdom |
Start | 06/2017 |
End | 06/2020 |
Description | Visiting Professorship |
Amount | £24,395 (GBP) |
Funding ID | VP1-2017-006 |
Organisation | The Leverhulme Trust |
Sector | Charity/Non Profit |
Country | United Kingdom |
Start | 12/2017 |
End | 06/2018 |
Title | Memory consistency model checking and test generation library. |
Description | This a library to verify processors simulation models against the memory consistency model they are supposed to adhere to. Verifying processors against memory consistency models is an extremely complex task, and this tool enables the catching of bugs early in the design cycle. |
Type Of Technology | Software |
Year Produced | 2016 |
Open Source License? | Yes |
Impact | This tool caught two bugs in the interaction of coherence protocols and the processor pipeline. We also produced patches to fix the bugs to the gem5 simulator used by ARM and AMD. |
URL | https://www.mail-archive.com/gem5-dev@gem5.org/msg18940.html |
Title | VerC3 toolkit |
Description | This toolkit contains a library for synthesizing distributed systems such as cache coherence protocols. These systems are notoriously hard to get right and the purpose of the tool is to auto-complete such systems while ensuring correctness. |
Type Of Technology | Software |
Year Produced | 2018 |
Open Source License? | Yes |
Impact | We have used the tool to auto-complete cache coherence protocols while satisfying correctness as mentioned in the above paper. The tool was released this year, so it is too early to comment on impacts. |
URL | http://homepages.inf.ed.ac.uk/vnagaraj/papers/date18.pdf |
Description | ARM Research Summit (2016) |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Industry/Business |
Results and Impact | Dr. Vijay Nagarajan was an invited speaker at the ARM Research Summit which attracted in excess of 100 delegates. The talk sparked questions and discussions and led to an ARM studentship in a related area. |
Year(s) Of Engagement Activity | 2016 |
URL | https://developer.arm.com/research/summit/previous-summits/2016 |
Description | ARM Research summit 2018 |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | More than 200 people attended this research summit where I presented an invited talk which sparked questions and discussions afterwards, and led contact with a company (HiSilicon) for potential collaboration. |
Year(s) Of Engagement Activity | 2018 |
URL | https://developer.arm.com/research/summit/previous-summits |
Description | ARM research summit 2017 |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Industry/Business |
Results and Impact | About 200 people attended the talk I gave at ARM which sparked off discussion. Now collaboration on the topic with ARM through a funded PhD student. The talk was picked by media insideHPC: https://insidehpc.com/2017/10/scaling-arm-architectures/ |
Year(s) Of Engagement Activity | 2017 |
URL | https://insidehpc.com/2017/10/scaling-arm-architectures/ |
Description | DemoFest 2018 |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | Regional |
Primary Audience | Industry/Business |
Results and Impact | DemoFest is the annual showcase event organised by the Scottish Informatics and Computer Science Alliance (SICSA), aimed at presenting research to industry and non-profit partners within Scotland. DemoFest 2018 had about 300 attendees. We had a poster presentation here, which led to requests for more information from PhD students and Scottish SMEs. |
Year(s) Of Engagement Activity | 2018 |
URL | https://www.sicsa.ac.uk/knowledge-exchange/industry-collaboration/demofest/ |
Description | HP visit (Palo Alto) |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Industry/Business |
Results and Impact | About 15 HP Labs employees attended the research talk. The talk lead to the start of a collaboration. |
Year(s) Of Engagement Activity | 2016 |
Description | HPCA 2017(Austin) |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | We presented two papers at HPCA 2017 (co-located with CGO and PPoPP), a top-tier conference in computer architecture attend by around 500 people. |
Year(s) Of Engagement Activity | 2017 |
URL | https://hpca2017.org/ |
Description | Micro 2016 conference (Taiwan) |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Cheng-Chieh Huang presented our paper at this top-tier architecture conference, attended by around 250 delegates. This led to eventually Cheng-Chieh landing a job at Mentor Graphics. |
Year(s) Of Engagement Activity | 2016 |
URL | https://www.microarch.org/micro49/ |
Description | School Visit (Princeton) |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Postgraduate students |
Results and Impact | Dr. Vijay Nagarajan delivered an invited presentation at Princeton university. About 20 academic staff and post-graduate students attended this talk, the talk sparked questions and discussions which lead to the exchange of papers and ideas between the research groups involved. |
Year(s) Of Engagement Activity | 2016 |
Description | School visit (Michigan) |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Postgraduate students |
Results and Impact | 20 post-graduate students and faculty attended for a research visit to University of Michigan which sparked questions and discussion afterwards. |
Year(s) Of Engagement Activity | 2016 |
Description | School visit (Rutgers) |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Postgraduate students |
Results and Impact | 25 students and faculty attended the research visit to University of Rutgers, which sparked questions and discussions afterwards. |
Year(s) Of Engagement Activity | 2016 |
Description | School visit (U Penn) |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Postgraduate students |
Results and Impact | Dr. Vijay Nagarajan delivered an invited lectuer at U Penn. This talk sparked questions and discussion with academic staff and post-graduate students afterwards. |
Year(s) Of Engagement Activity | 2016 |