Scalable Automatic Verification of GPU Kernels

Lead Research Organisation: Imperial College London
Department Name: Computing

Abstract

Until relatively recently the processing speed of computer systems increased at an exponential rate. Each year it was possible to use computers to solve computational problems that the previous year were out of reach. Around 2005, physical limits stopped this trend: it became infeasible to increase the clock rate of a processor without consuming an exorbitant amount of energy. To counter this, processor manufacturers have since aimed to provide increased performance by designing "multicore" processors, which consist of two or more processing units on a chip. Many computational tasks are parallelisable, in which case they can be distributed across the cores of a multicore processor.

Recently there has been a trend towards "many-core" processors, with hundreds or thousands of processing elements. For highly parallel applications, many-core processors can offer a massive speedup. The most readily available many-core processors are graphics processing units (GPUs) from companies such as NVIDIA, AMD, Intel, ARM and Imagination Technologies. GPUs were originally designed to accelerate graphical computations, but have become sufficiently general purpose for accelerating computational tasks in a variety of domains, including financial analysis, medical imaging, media processing and simulation.

GPUs are programmed by writing a "kernel" function, a program which will be executed by hundreds or thousands of threads running in parallel on the GPU. Parallel threads can communicate using shared memory, and synchronise using barrier statements. Writing correct GPU kernels is more challenging than writing sequential software due to two main problems: data races and barrier divergence. A data race occurs when two GPU threads access a shared memory location, at least one of the threads modifies this location, and no barrier statement separates the accesses. Data races almost always signify bugs in the kernel and lead to nondeterministic behaviour. Barrier divergence occurs when distinct threads reach different barrier statements in the kernel, and leads to deadlock.

Because GPUs are becoming widely used for general purpose software development, there is an urgent need for analysis techniques to help GPU programmers write correct code. Techniques to analyse GPU kernels with respect to data races and barrier divergence would significantly speed up the GPU software development process, leading to shorter time-to-market for GPU-accelerated applications.

In this project we plan to design formal techniques for verifying race- and divergence-freedom for GPU kernels. To be adopted and trusted by industrial practitioners our techniques must be highly automatic, scalable, and based on rigorous semantic foundations.

We plan to achieve these aims by developing a rigorous GPU memory model specification, and a formal semantics for GPU kernel execution that makes no assumptions about the structure of the kernel to be analysed. Based on these semantic foundations, we will design a verification technique that aims to prove absence of data races and barrier divergence by generating a "contract" for the kernel: a machine-checkable proof that kernel execution cannot lead to these defects. Contract-based verification is modular - each kernel procedure is analysed separately - and thus scalable. We will design a template-based contract generation method that captures domain-specific knowledge about common GPU programming idioms. This will allow efficient verification of GPU kernels that use typical data access patterns. For more intricate kernels that implement highly optimised algorithms, we will design a method based on Craig interpolation. This method will construct a proof of race- and divergence-freedom up to a bounded execution depth, and then attempt to extract a general contract from this proof.

Throughout, we will evaluate our methods using open source and industrial GPU kernels, including kernels provided by our industrial collaborators.

Planned Impact

Full achievement of our proposed research objectives has the potential for wide industrial, economic, academic and societal impact.

Major GPU vendors including NVIDIA, AMD, ARM, Imagination Technologies and Intel support the OpenCL programming model which we target in our proposed research. Each vendor supplies an OpenCL distribution consisting of software development tools geared towards their specific devices. Our proposed approach aims for highly automatic and scalable verification techniques suitable for analysing OpenCL kernels, thus GPU vendor tools could be significantly enhanced with formal analysis components based on our novel research. As well as GPU vendors, our research has potential for high impact on companies specialising in software development tools, e.g. Codeplay Software: these companies could exploit the semantic foundations and verification algorithms we will publish, allowing them to build high value GPU kernel analysers. Because ARM, Imagination Technologies and Codeplay Software are all based in the UK, this industrial impact could lead to significant economic benefit nationally.

Our methods have the potential for impact on software developers who write code for GPUs. In particular, GPUs are widely used in the video games industry. Our advanced analysis techniques allow early identification of GPU kernel defects, which has the potential to speed up video game development, reducing time to market. This is significant economically: video games were highlighted in the last UK budget as a key area for economic growth.

To be successful, our research will require breakthroughs in formal semantics for concurrent programs, and in automatic techniques for contract generation. These are both active research areas nationally and internationally, so these breakthroughs will lead to new methods that can be exploited and extended by a large body of researchers. We elaborate more on this in the Academic Beneficiaries section.

GPU-accelerated software is becoming widely used by society: at one extreme GPUs are embedded in modern mobile phones, at the other, GPUs are being employed to accelerate algorithms in safety critical domains such as medical imaging. Society's increasing dependence on many-core accelerated software means that robustness of such software is essential. The techniques we propose here directly address robustness concerns: our methods will assist in the development of highly reliable GPU-accelerated software, and thus has the potential for significant societal impact.

Publications

10 25 50
publication icon
Collingbourne P (2013) Programming Languages and Systems

publication icon
Chong N (2013) Barrier invariants

publication icon
Bardsley E (2014) Computer Aided Verification

publication icon
Bardsley E (2014) NASA Formal Methods

publication icon
Thomson P (2014) Concurrency testing using schedule bounding an empirical study in ACM SIGPLAN Notices

publication icon
Amani S (2014) Automatic verification of active device drivers in ACM SIGOPS Operating Systems Review

publication icon
Lidbury C (2015) Many-core compiler fuzzing

publication icon
Alglave J (2015) GPU Concurrency Weak Behaviours and Programming Assumptions in ACM SIGPLAN Notices

publication icon
Alglave J (2015) GPU Concurrency

publication icon
Lidbury C (2015) Many-core compiler fuzzing in ACM SIGPLAN Notices

publication icon
Betts A (2015) The Design and Implementation of a Verification Technique for GPU Kernels in ACM Transactions on Programming Languages and Systems

publication icon
Batty M (2016) Overhauling SC atomics in C11 and OpenCL in ACM SIGPLAN Notices

 
Description Through this grant we have developed new methods for automatically finiding defects in software that has been designed to be accelerated on Graphics Processing Units (GPUs). This is valuable because GPUs are now routinely used to accelerate computation in applications used by society in general. The project has led to advances in the theory underlying parallel program analysis, as well as to practical tools that can help developers of GPU software.
Exploitation Route The software we have developed is open source and can be used by industrial practitioners to improve their development processes.

We have published papers about our work at high profile venues, making the results available to the research community for future exploitation.

We have build strong links with specific companies: ARM and Imagination Technologies, who will be able to exploit our research in their commercial products.
Sectors Digital/Communication/Information Technologies (including Software),Education

URL http://multicore.doc.ic.ac.uk/tools/GPUVerify/
 
Description The results from this project have led to practical methods for analysing GPU kernels. This has had impact in the following ways: - Two UK companies, ARM and Imagination Technologies, are using the results of the project in their commercial tool chains as 3rd party plug-ins - The software arising from the project has been used to improve the reliability of computational benchmarks from Rightware, a Finnish company - The software arising from the project has been used to find defects in published software from Nvidia, AMD, and in a number of open source benchmarks, improving the quality of these suites - The techniques arising from the project have been incorporated in the material taught in the Software Reliability course at Imperial College London - Videos about the work on YouTube have received upwards of 2000 hits since April 2013
First Year Of Impact 2012
Sector Digital/Communication/Information Technologies (including Software),Education
Impact Types Societal,Economic

 
Description EPSRC Industiral CASE
Amount £30,000 (GBP)
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 01/2013 
End 06/2016
 
Description Intel Early Career Award
Amount $35,000 (USD)
Organisation Intel Corporation 
Sector Private
Country United States
Start 10/2013 
 
Description Intel Research Office
Amount $271,400 (USD)
Organisation Intel Corporation 
Sector Private
Country United States
Start 05/2013 
End 04/2016
 
Description AMD 
Organisation Advanced Micro Devices (AMD)
Country United States 
Sector Private 
PI Contribution We have worked with AMD on formalising a next generation GPU architecture, in the process identifying bugs in a published design from AMD, allowing them to be fixed long before production.
Collaborator Contribution AMD provided details of their design and participated in numerous interviews with our team, as well as aiding in writing a paper about the work.
Impact Paper at OOPLSA 2015 conference.
Start Year 2014
 
Description ARM 
Organisation Arm Limited
Country United Kingdom 
Sector Private 
PI Contribution We have contributed methods for analysing GPU kernels, and methods for testing GPU compilers.
Collaborator Contribution ARM have funded a PhD studentship (the direct contribution) and provided a great deal of staff time for expertise on our work.
Impact The project has led to technology transfer from Imperial College London to ARM. This is evidenced by a Technology Transfer Award from the HiPEAC network of excellence.
Start Year 2013
 
Description Intel 
Organisation Intel Corporation
Country United States 
Sector Private 
PI Contribution We have been investigating formal verification techniques for high assurance device drivers.
Collaborator Contribution As well as funding, Intel have provided expertise and access to software.
Impact The partnership has led to submitted papers, but not yet to any published outputs.
Start Year 2013
 
Description NVIDIA 
Organisation NVIDIA
Country Global 
Sector Private 
PI Contribution NVIDIA have successfully used our compiler testing technology to identify bugs in OpenCL compilers.
Collaborator Contribution The partner have confirmed a number of compiler bugs that we reported, and are now deploying a version of our testing software internally.
Impact A paper at PLDI 2015.
Start Year 2015
 
Description Qadeer 
Organisation Microsoft Research
Country Global 
Sector Private 
PI Contribution We have been working with Dr Shaz Qadeer at Microsoft Research, Redmond, on static analysis method for GPU kernels. We have contributed open source software, benchmarks and expertise.
Collaborator Contribution Microsoft have supported this resarch through travel funds for members of my research group, and by a large portion of Dr Shaz Qadeer's time to advise us. They have contributed open source software.
Impact The collaboration has led to many high quality research papers and to the GPUVerify software tool. The work has also been used inteaching.
Start Year 2011
 
Title GPUVerify 
Description A static verification tool for GPU kernels 
Type Of Technology Software 
Year Produced 2013 
Open Source License? Yes  
Impact The tool has been used by ARM, Imagination Technologies, and has received interest from other partners. 
URL http://multicore.doc.ic.ac.uk/tools/GPUVerify/
 
Description YouTube 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact I recorded and broadcast three research videos on YouTube. They have collectively been viewed more than 2000 times.

I have had many positive comments about the video.
Year(s) Of Engagement Activity 2013
URL https://www.youtube.com/watch?v=l8ysBPV8OvA