Scalable Automatic Verification of GPU Kernels

Lead Research Organisation: Imperial College London

Department Name: Computing

Abstract

Until relatively recently the processing speed of computer systems increased at an exponential rate. Each year it was possible to use computers to solve computational problems that the previous year were out of reach. Around 2005, physical limits stopped this trend: it became infeasible to increase the clock rate of a processor without consuming an exorbitant amount of energy. To counter this, processor manufacturers have since aimed to provide increased performance by designing "multicore" processors, which consist of two or more processing units on a chip. Many computational tasks are parallelisable, in which case they can be distributed across the cores of a multicore processor.

Recently there has been a trend towards "many-core" processors, with hundreds or thousands of processing elements. For highly parallel applications, many-core processors can offer a massive speedup. The most readily available many-core processors are graphics processing units (GPUs) from companies such as NVIDIA, AMD, Intel, ARM and Imagination Technologies. GPUs were originally designed to accelerate graphical computations, but have become sufficiently general purpose for accelerating computational tasks in a variety of domains, including financial analysis, medical imaging, media processing and simulation.

GPUs are programmed by writing a "kernel" function, a program which will be executed by hundreds or thousands of threads running in parallel on the GPU. Parallel threads can communicate using shared memory, and synchronise using barrier statements. Writing correct GPU kernels is more challenging than writing sequential software due to two main problems: data races and barrier divergence. A data race occurs when two GPU threads access a shared memory location, at least one of the threads modifies this location, and no barrier statement separates the accesses. Data races almost always signify bugs in the kernel and lead to nondeterministic behaviour. Barrier divergence occurs when distinct threads reach different barrier statements in the kernel, and leads to deadlock.

Because GPUs are becoming widely used for general purpose software development, there is an urgent need for analysis techniques to help GPU programmers write correct code. Techniques to analyse GPU kernels with respect to data races and barrier divergence would significantly speed up the GPU software development process, leading to shorter time-to-market for GPU-accelerated applications.

In this project we plan to design formal techniques for verifying race- and divergence-freedom for GPU kernels. To be adopted and trusted by industrial practitioners our techniques must be highly automatic, scalable, and based on rigorous semantic foundations.

We plan to achieve these aims by developing a rigorous GPU memory model specification, and a formal semantics for GPU kernel execution that makes no assumptions about the structure of the kernel to be analysed. Based on these semantic foundations, we will design a verification technique that aims to prove absence of data races and barrier divergence by generating a "contract" for the kernel: a machine-checkable proof that kernel execution cannot lead to these defects. Contract-based verification is modular - each kernel procedure is analysed separately - and thus scalable. We will design a template-based contract generation method that captures domain-specific knowledge about common GPU programming idioms. This will allow efficient verification of GPU kernels that use typical data access patterns. For more intricate kernels that implement highly optimised algorithms, we will design a method based on Craig interpolation. This method will construct a proof of race- and divergence-freedom up to a bounded execution depth, and then attempt to extract a general contract from this proof.

Throughout, we will evaluate our methods using open source and industrial GPU kernels, including kernels provided by our industrial collaborators.

Planned Impact

Full achievement of our proposed research objectives has the potential for wide industrial, economic, academic and societal impact.

Major GPU vendors including NVIDIA, AMD, ARM, Imagination Technologies and Intel support the OpenCL programming model which we target in our proposed research. Each vendor supplies an OpenCL distribution consisting of software development tools geared towards their specific devices. Our proposed approach aims for highly automatic and scalable verification techniques suitable for analysing OpenCL kernels, thus GPU vendor tools could be significantly enhanced with formal analysis components based on our novel research. As well as GPU vendors, our research has potential for high impact on companies specialising in software development tools, e.g. Codeplay Software: these companies could exploit the semantic foundations and verification algorithms we will publish, allowing them to build high value GPU kernel analysers. Because ARM, Imagination Technologies and Codeplay Software are all based in the UK, this industrial impact could lead to significant economic benefit nationally.

Our methods have the potential for impact on software developers who write code for GPUs. In particular, GPUs are widely used in the video games industry. Our advanced analysis techniques allow early identification of GPU kernel defects, which has the potential to speed up video game development, reducing time to market. This is significant economically: video games were highlighted in the last UK budget as a key area for economic growth.

To be successful, our research will require breakthroughs in formal semantics for concurrent programs, and in automatic techniques for contract generation. These are both active research areas nationally and internationally, so these breakthroughs will lead to new methods that can be exploited and extended by a large body of researchers. We elaborate more on this in the Academic Beneficiaries section.

GPU-accelerated software is becoming widely used by society: at one extreme GPUs are embedded in modern mobile phones, at the other, GPUs are being employed to accelerate algorithms in safety critical domains such as medical imaging. Society's increasing dependence on many-core accelerated software means that robustness of such software is essential. The techniques we propose here directly address robustness concerns: our methods will assist in the development of highly reliable GPU-accelerated software, and thus has the potential for significant societal impact.

Funded Value:

£100,057

Funded Period:

Jun 13 - Jun 14

Funder:

EPSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

EP/K011499/1

Principal Investigator:

Alastair Donaldson

Research Subject:

Info. & commun. Technol. (100%)

Research Topic:

Fundamentals of Computing (60%)

Software Engineering (40%)

Organisations

People	ORCID iD
Alastair Donaldson (Principal Investigator)

Publications

Author Name Title Publication Date Published

10 25 50

Collingbourne P (2013) Programming Languages and Systems

Chong N (2013) Barrier invariants

Chong N (2013) Barrier invariants a shared state abstraction for the analysis of data-dependent GPU kernels in ACM SIGPLAN Notices

Bardsley E (2014) Computer Aided Verification

Bardsley E (2014) NASA Formal Methods

Chong N (2014) A sound and complete abstraction for reasoning about parallel prefix sums

Thomson P (2014) Concurrency testing using schedule bounding an empirical study in ACM SIGPLAN Notices

Thomson P (2014) Concurrency testing using schedule bounding

Amani S (2014) Automatic verification of active device drivers in ACM SIGOPS Operating Systems Review

Lidbury C (2015) Many-core compiler fuzzing

Wickerson J (2015) Remote-scope promotion: clarified, rectified, and verified

Alglave J (2015) GPU Concurrency Weak Behaviours and Programming Assumptions in ACM SIGPLAN Notices

Alglave J (2015) GPU Concurrency

Lidbury C (2015) Many-core compiler fuzzing in ACM SIGPLAN Notices

Betts A (2015) The Design and Implementation of a Verification Technique for GPU Kernels in ACM Transactions on Programming Languages and Systems

Batty M (2016) Overhauling SC atomics in C11 and OpenCL

Batty M (2016) Overhauling SC atomics in C11 and OpenCL in ACM SIGPLAN Notices

Key Findings
Impact Summary
Further Funding
Collaboration
Software and Technical Products
Engagement Activities


Description	Through this grant we have developed new methods for automatically finiding defects in software that has been designed to be accelerated on Graphics Processing Units (GPUs). This is valuable because GPUs are now routinely used to accelerate computation in applications used by society in general. The project has led to advances in the theory underlying parallel program analysis, as well as to practical tools that can help developers of GPU software.
Exploitation Route	The software we have developed is open source and can be used by industrial practitioners to improve their development processes. We have published papers about our work at high profile venues, making the results available to the research community for future exploitation. We have build strong links with specific companies: ARM and Imagination Technologies, who will be able to exploit our research in their commercial products.
Sectors	Digital/Communication/Information Technologies (including Software),Education
URL	http://multicore.doc.ic.ac.uk/tools/GPUVerify/


Description	The results from this project have led to practical methods for analysing GPU kernels. This has had impact in the following ways: - Two UK companies, ARM and Imagination Technologies, are using the results of the project in their commercial tool chains as 3rd party plug-ins - The software arising from the project has been used to improve the reliability of computational benchmarks from Rightware, a Finnish company - The software arising from the project has been used to find defects in published software from Nvidia, AMD, and in a number of open source benchmarks, improving the quality of these suites - The techniques arising from the project have been incorporated in the material taught in the Software Reliability course at Imperial College London - Videos about the work on YouTube have received upwards of 2000 hits since April 2013
First Year Of Impact	2012
Sector	Digital/Communication/Information Technologies (including Software),Education
Impact Types	Societal,Economic


Description	EPSRC Industiral CASE
Amount	£30,000 (GBP)
Organisation	Engineering and Physical Sciences Research Council (EPSRC)
Sector	Public
Country	United Kingdom
Start	01/2013
End	06/2016


Description	Intel Early Career Award
Amount	$35,000 (USD)
Organisation	Intel Corporation
Sector	Private
Country	United States
Start	10/2013


Description	Intel Research Office
Amount	$271,400 (USD)
Organisation	Intel Corporation
Sector	Private
Country	United States
Start	05/2013
End	04/2016


Description	AMD
Organisation	Advanced Micro Devices (AMD)
Country	United States
Sector	Private
PI Contribution	We have worked with AMD on formalising a next generation GPU architecture, in the process identifying bugs in a published design from AMD, allowing them to be fixed long before production.
Collaborator Contribution	AMD provided details of their design and participated in numerous interviews with our team, as well as aiding in writing a paper about the work.
Impact	Paper at OOPLSA 2015 conference.
Start Year	2014


Description	ARM
Organisation	Arm Limited
Country	United Kingdom
Sector	Private
PI Contribution	We have contributed methods for analysing GPU kernels, and methods for testing GPU compilers.
Collaborator Contribution	ARM have funded a PhD studentship (the direct contribution) and provided a great deal of staff time for expertise on our work.
Impact	The project has led to technology transfer from Imperial College London to ARM. This is evidenced by a Technology Transfer Award from the HiPEAC network of excellence.
Start Year	2013


Description	Intel
Organisation	Intel Corporation
Country	United States
Sector	Private
PI Contribution	We have been investigating formal verification techniques for high assurance device drivers.
Collaborator Contribution	As well as funding, Intel have provided expertise and access to software.
Impact	The partnership has led to submitted papers, but not yet to any published outputs.
Start Year	2013


Description	NVIDIA
Organisation	NVIDIA
Country	Global
Sector	Private
PI Contribution	NVIDIA have successfully used our compiler testing technology to identify bugs in OpenCL compilers.
Collaborator Contribution	The partner have confirmed a number of compiler bugs that we reported, and are now deploying a version of our testing software internally.
Impact	A paper at PLDI 2015.
Start Year	2015


Description	Qadeer
Organisation	Microsoft Research
Country	Global
Sector	Private
PI Contribution	We have been working with Dr Shaz Qadeer at Microsoft Research, Redmond, on static analysis method for GPU kernels. We have contributed open source software, benchmarks and expertise.
Collaborator Contribution	Microsoft have supported this resarch through travel funds for members of my research group, and by a large portion of Dr Shaz Qadeer's time to advise us. They have contributed open source software.
Impact	The collaboration has led to many high quality research papers and to the GPUVerify software tool. The work has also been used inteaching.
Start Year	2011


Title	GPUVerify
Description	A static verification tool for GPU kernels
Type Of Technology	Software
Year Produced	2013
Open Source License?	Yes
Impact	The tool has been used by ARM, Imagination Technologies, and has received interest from other partners.
URL	http://multicore.doc.ic.ac.uk/tools/GPUVerify/


Description	YouTube
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Public/other audiences
Results and Impact	I recorded and broadcast three research videos on YouTube. They have collectively been viewed more than 2000 times. I have had many positive comments about the video.
Year(s) Of Engagement Activity	2013
URL	https://www.youtube.com/watch?v=l8ysBPV8OvA

Abstract

Planned Impact

Organisations

People

ORCID iD

Publications