Scalable Automatic Verification of GPU Kernels
Lead Research Organisation:
Imperial College London
Department Name: Computing
Abstract
Until relatively recently the processing speed of computer systems increased at an exponential rate. Each year it was possible to use computers to solve computational problems that the previous year were out of reach. Around 2005, physical limits stopped this trend: it became infeasible to increase the clock rate of a processor without consuming an exorbitant amount of energy. To counter this, processor manufacturers have since aimed to provide increased performance by designing "multicore" processors, which consist of two or more processing units on a chip. Many computational tasks are parallelisable, in which case they can be distributed across the cores of a multicore processor.
Recently there has been a trend towards "many-core" processors, with hundreds or thousands of processing elements. For highly parallel applications, many-core processors can offer a massive speedup. The most readily available many-core processors are graphics processing units (GPUs) from companies such as NVIDIA, AMD, Intel, ARM and Imagination Technologies. GPUs were originally designed to accelerate graphical computations, but have become sufficiently general purpose for accelerating computational tasks in a variety of domains, including financial analysis, medical imaging, media processing and simulation.
GPUs are programmed by writing a "kernel" function, a program which will be executed by hundreds or thousands of threads running in parallel on the GPU. Parallel threads can communicate using shared memory, and synchronise using barrier statements. Writing correct GPU kernels is more challenging than writing sequential software due to two main problems: data races and barrier divergence. A data race occurs when two GPU threads access a shared memory location, at least one of the threads modifies this location, and no barrier statement separates the accesses. Data races almost always signify bugs in the kernel and lead to nondeterministic behaviour. Barrier divergence occurs when distinct threads reach different barrier statements in the kernel, and leads to deadlock.
Because GPUs are becoming widely used for general purpose software development, there is an urgent need for analysis techniques to help GPU programmers write correct code. Techniques to analyse GPU kernels with respect to data races and barrier divergence would significantly speed up the GPU software development process, leading to shorter time-to-market for GPU-accelerated applications.
In this project we plan to design formal techniques for verifying race- and divergence-freedom for GPU kernels. To be adopted and trusted by industrial practitioners our techniques must be highly automatic, scalable, and based on rigorous semantic foundations.
We plan to achieve these aims by developing a rigorous GPU memory model specification, and a formal semantics for GPU kernel execution that makes no assumptions about the structure of the kernel to be analysed. Based on these semantic foundations, we will design a verification technique that aims to prove absence of data races and barrier divergence by generating a "contract" for the kernel: a machine-checkable proof that kernel execution cannot lead to these defects. Contract-based verification is modular - each kernel procedure is analysed separately - and thus scalable. We will design a template-based contract generation method that captures domain-specific knowledge about common GPU programming idioms. This will allow efficient verification of GPU kernels that use typical data access patterns. For more intricate kernels that implement highly optimised algorithms, we will design a method based on Craig interpolation. This method will construct a proof of race- and divergence-freedom up to a bounded execution depth, and then attempt to extract a general contract from this proof.
Throughout, we will evaluate our methods using open source and industrial GPU kernels, including kernels provided by our industrial collaborators.
Recently there has been a trend towards "many-core" processors, with hundreds or thousands of processing elements. For highly parallel applications, many-core processors can offer a massive speedup. The most readily available many-core processors are graphics processing units (GPUs) from companies such as NVIDIA, AMD, Intel, ARM and Imagination Technologies. GPUs were originally designed to accelerate graphical computations, but have become sufficiently general purpose for accelerating computational tasks in a variety of domains, including financial analysis, medical imaging, media processing and simulation.
GPUs are programmed by writing a "kernel" function, a program which will be executed by hundreds or thousands of threads running in parallel on the GPU. Parallel threads can communicate using shared memory, and synchronise using barrier statements. Writing correct GPU kernels is more challenging than writing sequential software due to two main problems: data races and barrier divergence. A data race occurs when two GPU threads access a shared memory location, at least one of the threads modifies this location, and no barrier statement separates the accesses. Data races almost always signify bugs in the kernel and lead to nondeterministic behaviour. Barrier divergence occurs when distinct threads reach different barrier statements in the kernel, and leads to deadlock.
Because GPUs are becoming widely used for general purpose software development, there is an urgent need for analysis techniques to help GPU programmers write correct code. Techniques to analyse GPU kernels with respect to data races and barrier divergence would significantly speed up the GPU software development process, leading to shorter time-to-market for GPU-accelerated applications.
In this project we plan to design formal techniques for verifying race- and divergence-freedom for GPU kernels. To be adopted and trusted by industrial practitioners our techniques must be highly automatic, scalable, and based on rigorous semantic foundations.
We plan to achieve these aims by developing a rigorous GPU memory model specification, and a formal semantics for GPU kernel execution that makes no assumptions about the structure of the kernel to be analysed. Based on these semantic foundations, we will design a verification technique that aims to prove absence of data races and barrier divergence by generating a "contract" for the kernel: a machine-checkable proof that kernel execution cannot lead to these defects. Contract-based verification is modular - each kernel procedure is analysed separately - and thus scalable. We will design a template-based contract generation method that captures domain-specific knowledge about common GPU programming idioms. This will allow efficient verification of GPU kernels that use typical data access patterns. For more intricate kernels that implement highly optimised algorithms, we will design a method based on Craig interpolation. This method will construct a proof of race- and divergence-freedom up to a bounded execution depth, and then attempt to extract a general contract from this proof.
Throughout, we will evaluate our methods using open source and industrial GPU kernels, including kernels provided by our industrial collaborators.
Planned Impact
Full achievement of our proposed research objectives has the potential for wide industrial, economic, academic and societal impact.
Major GPU vendors including NVIDIA, AMD, ARM, Imagination Technologies and Intel support the OpenCL programming model which we target in our proposed research. Each vendor supplies an OpenCL distribution consisting of software development tools geared towards their specific devices. Our proposed approach aims for highly automatic and scalable verification techniques suitable for analysing OpenCL kernels, thus GPU vendor tools could be significantly enhanced with formal analysis components based on our novel research. As well as GPU vendors, our research has potential for high impact on companies specialising in software development tools, e.g. Codeplay Software: these companies could exploit the semantic foundations and verification algorithms we will publish, allowing them to build high value GPU kernel analysers. Because ARM, Imagination Technologies and Codeplay Software are all based in the UK, this industrial impact could lead to significant economic benefit nationally.
Our methods have the potential for impact on software developers who write code for GPUs. In particular, GPUs are widely used in the video games industry. Our advanced analysis techniques allow early identification of GPU kernel defects, which has the potential to speed up video game development, reducing time to market. This is significant economically: video games were highlighted in the last UK budget as a key area for economic growth.
To be successful, our research will require breakthroughs in formal semantics for concurrent programs, and in automatic techniques for contract generation. These are both active research areas nationally and internationally, so these breakthroughs will lead to new methods that can be exploited and extended by a large body of researchers. We elaborate more on this in the Academic Beneficiaries section.
GPU-accelerated software is becoming widely used by society: at one extreme GPUs are embedded in modern mobile phones, at the other, GPUs are being employed to accelerate algorithms in safety critical domains such as medical imaging. Society's increasing dependence on many-core accelerated software means that robustness of such software is essential. The techniques we propose here directly address robustness concerns: our methods will assist in the development of highly reliable GPU-accelerated software, and thus has the potential for significant societal impact.
Major GPU vendors including NVIDIA, AMD, ARM, Imagination Technologies and Intel support the OpenCL programming model which we target in our proposed research. Each vendor supplies an OpenCL distribution consisting of software development tools geared towards their specific devices. Our proposed approach aims for highly automatic and scalable verification techniques suitable for analysing OpenCL kernels, thus GPU vendor tools could be significantly enhanced with formal analysis components based on our novel research. As well as GPU vendors, our research has potential for high impact on companies specialising in software development tools, e.g. Codeplay Software: these companies could exploit the semantic foundations and verification algorithms we will publish, allowing them to build high value GPU kernel analysers. Because ARM, Imagination Technologies and Codeplay Software are all based in the UK, this industrial impact could lead to significant economic benefit nationally.
Our methods have the potential for impact on software developers who write code for GPUs. In particular, GPUs are widely used in the video games industry. Our advanced analysis techniques allow early identification of GPU kernel defects, which has the potential to speed up video game development, reducing time to market. This is significant economically: video games were highlighted in the last UK budget as a key area for economic growth.
To be successful, our research will require breakthroughs in formal semantics for concurrent programs, and in automatic techniques for contract generation. These are both active research areas nationally and internationally, so these breakthroughs will lead to new methods that can be exploited and extended by a large body of researchers. We elaborate more on this in the Academic Beneficiaries section.
GPU-accelerated software is becoming widely used by society: at one extreme GPUs are embedded in modern mobile phones, at the other, GPUs are being employed to accelerate algorithms in safety critical domains such as medical imaging. Society's increasing dependence on many-core accelerated software means that robustness of such software is essential. The techniques we propose here directly address robustness concerns: our methods will assist in the development of highly reliable GPU-accelerated software, and thus has the potential for significant societal impact.
Organisations
- Imperial College London (Lead Research Organisation)
- Arm Limited (Collaboration)
- Advanced Micro Devices (AMD) (Collaboration)
- NVIDIA (Collaboration)
- Microsoft Research (Collaboration)
- Intel (United States) (Collaboration)
- University of Cambridge (Project Partner)
- University of Utah (Project Partner)
- Microsoft (United States) (Project Partner)
People |
ORCID iD |
Alastair Donaldson (Principal Investigator) |
Publications
Alglave J
(2015)
GPU Concurrency Weak Behaviours and Programming Assumptions
in ACM SIGPLAN Notices
Alglave J
(2015)
GPU Concurrency
Amani S
(2014)
Automatic verification of active device drivers
in ACM SIGOPS Operating Systems Review
Bardsley E
(2014)
NASA Formal Methods
Bardsley E
(2014)
Computer Aided Verification
Batty M
(2016)
Overhauling SC atomics in C11 and OpenCL
Batty M
(2016)
Overhauling SC atomics in C11 and OpenCL
in ACM SIGPLAN Notices
Betts A
(2015)
The Design and Implementation of a Verification Technique for GPU Kernels
in ACM Transactions on Programming Languages and Systems
Chong N
(2013)
Barrier invariants a shared state abstraction for the analysis of data-dependent GPU kernels
in ACM SIGPLAN Notices
Chong N
(2013)
Barrier invariants
Collingbourne P
(2013)
Programming Languages and Systems
Donaldson A.F.
(2014)
The GPUVerify method: A tutorial overview
in Electronic Communications of the EASST
Lidbury C
(2015)
Many-core compiler fuzzing
in ACM SIGPLAN Notices
Lidbury C
(2015)
Many-core compiler fuzzing
Thomson P
(2014)
Concurrency testing using schedule bounding an empirical study
in ACM SIGPLAN Notices
Thomson P
(2014)
Concurrency testing using schedule bounding
Wickerson J
(2015)
Remote-scope promotion: clarified, rectified, and verified
Description | Through this grant we have developed new methods for automatically finiding defects in software that has been designed to be accelerated on Graphics Processing Units (GPUs). This is valuable because GPUs are now routinely used to accelerate computation in applications used by society in general. The project has led to advances in the theory underlying parallel program analysis, as well as to practical tools that can help developers of GPU software. |
Exploitation Route | The software we have developed is open source and can be used by industrial practitioners to improve their development processes. We have published papers about our work at high profile venues, making the results available to the research community for future exploitation. We have build strong links with specific companies: ARM and Imagination Technologies, who will be able to exploit our research in their commercial products. |
Sectors | Digital/Communication/Information Technologies (including Software) Education |
URL | http://multicore.doc.ic.ac.uk/tools/GPUVerify/ |
Description | The results from this project have led to practical methods for analysing GPU kernels. This has had impact in the following ways: - Two UK companies, ARM and Imagination Technologies, are using the results of the project in their commercial tool chains as 3rd party plug-ins - The software arising from the project has been used to improve the reliability of computational benchmarks from Rightware, a Finnish company - The software arising from the project has been used to find defects in published software from Nvidia, AMD, and in a number of open source benchmarks, improving the quality of these suites - The techniques arising from the project have been incorporated in the material taught in the Software Reliability course at Imperial College London - Videos about the work on YouTube have received upwards of 2000 hits since April 2013 |
First Year Of Impact | 2012 |
Sector | Digital/Communication/Information Technologies (including Software),Education |
Impact Types | Societal Economic |
Description | EPSRC Industiral CASE |
Amount | £30,000 (GBP) |
Organisation | Engineering and Physical Sciences Research Council (EPSRC) |
Sector | Public |
Country | United Kingdom |
Start | 01/2013 |
End | 06/2016 |
Description | Intel Early Career Award |
Amount | $35,000 (USD) |
Organisation | Intel Corporation |
Sector | Private |
Country | United States |
Start | 09/2013 |
Description | Intel Research Office |
Amount | $271,400 (USD) |
Organisation | Intel Corporation |
Sector | Private |
Country | United States |
Start | 04/2013 |
End | 04/2016 |
Description | AMD |
Organisation | Advanced Micro Devices (AMD) |
Country | United States |
Sector | Private |
PI Contribution | We have worked with AMD on formalising a next generation GPU architecture, in the process identifying bugs in a published design from AMD, allowing them to be fixed long before production. |
Collaborator Contribution | AMD provided details of their design and participated in numerous interviews with our team, as well as aiding in writing a paper about the work. |
Impact | Paper at OOPLSA 2015 conference. |
Start Year | 2014 |
Description | ARM |
Organisation | Arm Limited |
Country | United Kingdom |
Sector | Private |
PI Contribution | We have contributed methods for analysing GPU kernels, and methods for testing GPU compilers. |
Collaborator Contribution | ARM have funded a PhD studentship (the direct contribution) and provided a great deal of staff time for expertise on our work. |
Impact | The project has led to technology transfer from Imperial College London to ARM. This is evidenced by a Technology Transfer Award from the HiPEAC network of excellence. |
Start Year | 2013 |
Description | Intel |
Organisation | Intel Corporation |
Country | United States |
Sector | Private |
PI Contribution | We have been investigating formal verification techniques for high assurance device drivers. |
Collaborator Contribution | As well as funding, Intel have provided expertise and access to software. |
Impact | The partnership has led to submitted papers, but not yet to any published outputs. |
Start Year | 2013 |
Description | NVIDIA |
Organisation | NVIDIA |
Country | Global |
Sector | Private |
PI Contribution | NVIDIA have successfully used our compiler testing technology to identify bugs in OpenCL compilers. |
Collaborator Contribution | The partner have confirmed a number of compiler bugs that we reported, and are now deploying a version of our testing software internally. |
Impact | A paper at PLDI 2015. |
Start Year | 2015 |
Description | Qadeer |
Organisation | Microsoft Research |
Country | Global |
Sector | Private |
PI Contribution | We have been working with Dr Shaz Qadeer at Microsoft Research, Redmond, on static analysis method for GPU kernels. We have contributed open source software, benchmarks and expertise. |
Collaborator Contribution | Microsoft have supported this resarch through travel funds for members of my research group, and by a large portion of Dr Shaz Qadeer's time to advise us. They have contributed open source software. |
Impact | The collaboration has led to many high quality research papers and to the GPUVerify software tool. The work has also been used inteaching. |
Start Year | 2011 |
Title | GPUVerify |
Description | A static verification tool for GPU kernels |
Type Of Technology | Software |
Year Produced | 2013 |
Open Source License? | Yes |
Impact | The tool has been used by ARM, Imagination Technologies, and has received interest from other partners. |
URL | http://multicore.doc.ic.ac.uk/tools/GPUVerify/ |
Description | YouTube |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Public/other audiences |
Results and Impact | I recorded and broadcast three research videos on YouTube. They have collectively been viewed more than 2000 times. I have had many positive comments about the video. |
Year(s) Of Engagement Activity | 2013 |
URL | https://www.youtube.com/watch?v=l8ysBPV8OvA |