Reliable Many-Core Programming

Lead Research Organisation: Imperial College London
Department Name: Computing

Abstract

The computational demands of modern computer applications make the
pursuit of high performance more critical than ever, and mobile,
battery-powered devices, as well as concerns related to climate
change, require high performance to co-exist with energy-efficiency.
Due to physical limits, the traditional means for improving hardware
performance by increasing processor frequency now carries an
unacceptably high energy cost. Advances in processor fabrication
technology instead allow the construction of many-core processors,
where hundreds or thousands of processing elements are placed on a
single chip, promising high performance and energy-efficiency through
sheer volume of processing elements.

Many-core devices are present in practically all consumer devices,
including smartphones and tablets. As a result, the general public in
developed countries interact with many-core software daily. Many-core
technology is also used to accelerate safety-critical software in
domains such as medical imaging and autonomous vehicle navigation.

It is thus important that many-core software should be reliable. This
requires reliable software from programmers, but also a reliable
"stack" to support this software, including compilers that allow
software to execute on many-core devices, and the many-core devices
themselves. Recent work on formal verification and testing by myself
and other researchers has identified serious technical problems
spanning the many-core stack. These problems undermine confidence in
applications of many-core technology: defective many-core software
could risk fatal accidents in critical domains, and impact negatively
on users in other important application areas.

My long-term vision is that the reliability of many-core programming
can be transformed through breakthroughs in programming language
specification, formal verification and test case generation, enabling
automated tools to assist programmers and platform vendors in
constructing reliable many-core applications and language
implementations. The aim of this five-year Fellowship is to undertake
foundational research to investigate a number of open problems whose
solution is key to enabling this long-term vision.

First, I seek to investigate whether it is possible to precisely
express the intricacies of many-core programming language using formal
mathematics, providing a rigorous basis on which software and language
implementations can be constructed.

Second, I aim to tackle several open problems that stand in the way of
effective formal verification of many-core software, which would allow
developers to obtain strong guarantees that such software will operate
as required.

Third, I will investigate raising this level of rigour beyond
many-core languages. A growing trend is for applications to be written
in relatively simple, high-level representations, and then
automatically translated into high-performance many-core code. This
translation process must preserve the meaning of programs; I will
investigate methods for formally certifying that it does.

Fourth, I will formulate new methods for testing many-core language
implementations, exploiting the rigorous language definitions brought
by my approach to enable high test coverage of subtle language
features.

Collectively, progress on these problems promises to enable a
*high-assurance* many-core stack. I will demonstrate one instance of
such a stack for the industry-standard OpenCL language and the PENCIL
high-level language, showing that high-level PENCIL programs can be
reliably compiled into rigorously-defined OpenCL, integrated with
verified library components, and deployed on thoroughly tested
implementations from many-core vendors.

Partnership with four leading many-core technology vendors, AMD, ARM,
Imagination Technologies and NVIDIA, provides excellent opportunities
for the advances the Fellowship makes to have broad industrial impact.

Planned Impact

The project has a large set of potential beneficiaries. The ultimate
beneficiaries of my long-term vision for reliable many-core
programming are end-users of software that employs many-core
acceleration. Because many-core hardware now ships with practically
all desktop, laptop and mobile devices, society is starting to
interact with many-core software on a daily basis. By leading to
improvements in the robustness of this software, and the platforms on
which it operates, my proposed project offers significant societal
benefit. Many-core technology is increasingly employed in
safety-critical contexts, for example it has found application in
medical imaging, pedestrian detection, and autonomous navigation for
self-driving cars. Reliability improvements in these areas offer to
make society safer. The rise of many-core technology also presents
new attack vectors that pose cyber-security and cyber-terrorism risks;
the advanced reasoning techniques that my research will bring will
allow many-core software to be rigorously engineered to defend against
these threats.

The fundamental advances made during the project will have long-term
academic impact, yielding results that researchers in several fields
can exploit. See the "Academic Beneficiaries" form for more details.

My Fellowship has very high potential for industrial impact.
Many-core platform vendors, including my project partners AMD, ARM,
Imagination Technologies and NVIDIA and several other companies, will
be able to exploit the Fellowship research to improve many-core
language implementations, increasing their market value. There is
potential for commercial exploitation of the research outcomes by tool
providers, and the wide range of software developers who apply
many-core acceleration in diverse domains will be able to build more
reliable software using the advanced analyses the Fellowship will
bring. All these stakeholders will benefit from the transformation in
the rigour of many-core programming language design offered by my
approach. Specifically, the 80+ industrial contributing members of
the Khronos Group will be beneficiaries of improvements to OpenCL and
related APIs that arise from the project. The large and growing
market associated with many-core technology means that uptake by
industry has the potential to lead to corresponding economic impact.

Close collaboration with industry during the Fellowship will give rise
to industrial impact on the Fellowship partners early during the
five-year project, and my strong impact plans will help to broaden
this impact to other industrial beneficiaries during the project
lifetime, leading to widespread industrial adoption over a 5-10-year
period. This widespread adoption has the potential to transform
many-core programming, impacting on industrial practice in the long
term and benefiting society for decades to come.

Publications

10 25 50
 
Description The main finding is that device drivers for many-core processors can be effectively tested through the use of transformations that preserve the meanings of programs. This led to the successful research behind the GraphicsFuzz spin-out company that was acquired by Google in 2018.
Exploitation Route The open source GraphicsFuzz project (http://github.com/google/graphicsfuzz) can be used by the GPU industry.
Sectors Digital/Communication/Information Technologies (including Software)

 
Description GraphicsFuzz, a tool that automatically finds bugs in compilers for graphics processing units (GPUs), is based on fundamental advances in metamorphic testing pioneered by Donaldson's Multicore Programming research group at Imperial College London, with funding from the Reliable Many-Core Programming Fellowship. GraphicsFuzz has found serious, security-critical defects in compilers from all major GPU vendors, leading to the spin-out of GraphicsFuzz Ltd., to commercialise the research. GraphicsFuzz Ltd. was acquired by Google in 2018 for an undisclosed sum, establishing a new London-based team at Google, led by Donaldson, to focus on Android graphics driver quality. Google have open-sourced the GraphicsFuzz tool, and it is now being used routinely to find serious defects in graphics drivers that affect the Android operating system (estimated 2.5 billion active devices worldwide) and Chrome web browser (used by an estimated 66% of all internet users). The global GPU market is predicted to grow to grow from $20 bn in 2019 to $200 bn in 2027, so the industrial uptake of this tools - based on fundamental research from Imperial College London - is having major impact on industry, making GPU software safer and more secure for end users, and reducing development costs of GPU software and drivers. Impact of the GraphicsFuzz tool: The original GLFuzz tool was highly successful in applying metamorphic testing to finding bugs in GLSL compilers from all seven major GPU providers. Donaldson disseminated this to a wide audience by writing a series of blog posts describing these findings (AMD post, Apple post, ARM post, Imagination Technologies post, Intel post, NVIDIA post, Qualcomm post). These bugs included three serious security vulnerabilities for which the Imperial team received public recognition - an Apple iOS information leak, an NVIDIA kernel mode vulnerability, and data-capture exploit caused by a defect in an ARM graphics driver. Donaldson and team promoted GraphicsFuzz on social media, reaching the front page of HackerNews and being featured by sites such as Phoronix. Acquisition of GraphicsFuzz Ltd. by Google, and open sourcing. Encouraged by this media interest, Donaldson and his post-docs Hugues Evrard and Paul Thomson secured funding from the DCMS/Innovate UK ICURe programme to investigate commercialisation opportunities for the work, leading to them founding GraphicsFuzz Ltd., incorporated in December 2017 and spun-out from Imperial in February 2018. The team promoted the technology via an innovative web-app that would run a set of GraphicsFuzz tests on a user's device, allowing them to tweet a summary of the test results. This raised the profile of GraphicsFuzz quickly, leading to the acquisition of the company by Google in August 2018 for an undisclosed sum. Google open-sourced the GraphicsFuzz tool in September 2018. Ongoing impact of GraphicsFuzz on graphics driver quality: Donaldson, Evrard and Thomson joined Google UK, forming a team specialising in methods to improve the quality of Android graphics device drivers, deployed on billions of devices worldwide. GraphicsFuzz is being used to find defects in Android graphics drivers from ARM, Qualcomm, Imagination Technologies and NVIDIA, who collectively produce GPUs for more than 79% of the smartphone and tablet GPU market. Tests that expose these bugs are being integrated with the Android Compatibility Test Suite, meaning that devices can only ship future versions of Android if these bugs are fixed in their drivers. Impact on Chrome: The WebGL technology brings graphics rendering capabilities to web applications but exposes widely-used browsers such as Chrome - used by most of the world's internet users - to graphics-related security vulnerabilities. GraphicsFuzz has been integrated with the ClusterFuzz project for continuous fuzz testing of Chrome, and since January 2019 has found more than a dozen security vulnerabilities in Chrome's WebGL implementation that had been missed by other testing methods. These vulnerabilities have now been fixed. Impact on open source graphics drivers: Google has used GraphicsFuzz to rigorously test AMD's open-source driver for Vulkan, finding a large number of driver bugs, which have been now fixed, including bugs that turned out to affect the widely-used LLVM compiler framework. GraphicsFuzz has also been used to find defects in the open-source Mesa drivers for Intel and NVIDIA GPUs. Tools for the Vulkan ecosystem: GraphicsFuzz has found more than 100 bugs collectively in a suite of tools that underpin the Vulkan graphics programming model: the glslang shader translator, the spirv-opt and spirv-val optimizer and validator that ship as part of the SPIRV-Tools framework, and the spirv-cross cross-compiler. Integration with Vulkan CTS: Every Vulkan driver must pass the Vulkan Conformance Test Suite (CTS), which now features a new graphicsfuzz test group, comprising tests generated by GraphicsFuzz that exposed bugs in previously-conformant Vulkan drivers. Adding these tests to the Vulkan CTS requires fixes to any defective drivers in order for them to remain conformant. The Vulkan CTS now features more than 300 GraphicsFuzz tests.
First Year Of Impact 2017
Sector Digital/Communication/Information Technologies (including Software)
Impact Types Societal,Economic

 
Description Chrome University Research Program
Amount $172,000 (USD)
Organisation Google 
Sector Private
Country United States
Start 01/2018 
End 12/2018
 
Description EPSRC Progamme Grant
Amount £6,100,000 (GBP)
Funding ID EP/R006865/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 01/2018 
End 12/2023
 
Description EPSRC Standard Mode
Amount £672,083 (GBP)
Funding ID EP/R011605/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 01/2018 
End 01/2021
 
Description ICURe Program, Phase 1
Amount £62,165 (GBP)
Organisation Innovate UK 
Sector Public
Country United Kingdom
Start 01/2017 
End 10/2017
 
Description UK Research Institute in Verified Trustworthy Software Systems
Amount £102,847 (GBP)
Organisation National Cyber Security Centre 
Sector Public
Country United Kingdom
Start 08/2017 
End 03/2018
 
Title Framework for compiler testing 
Description A framework for applying metamorphic testing to graphics compilers. 
Type Of Material Improvements to research infrastructure 
Year Produced 2016 
Provided To Others? Yes  
Impact I have founded a spinout company, GraphicsFuzz, based on this innovation. 
URL http://www.graphicsfuzz.com/
 
Description Google 
Organisation Google
Country United States 
Sector Private 
PI Contribution We are designing methods for automated testing of the graphics rendering components of web browsers, including Chrome.
Collaborator Contribution Google are funding the work, and are providing us with feedback on the effectiveness of our techniques.
Impact The work has led to numerous Chrome bug reports, e.g.: https://bugs.chromium.org/p/chromium/issues/detail?id=806201
Start Year 2018
 
Description RTWH Aachen 
Organisation RWTH Aachen University
Country Germany 
Sector Academic/University 
PI Contribution We have had a fruitful collaboration with RTWH Aachen University in relation to floating point support for sybmbolic execution.
Collaborator Contribution Our partners implemented a tool, and we were able to write a paper presenting a detailed comparison of their tool with a competing tool developed at Imperial.
Impact We published a joint paper (https://doi.org/10.1109/ASE.2017.8115670), which won Best Experience Report award at the ASE 2017 conference.
Start Year 2016
 
Title GLFuzz: Automated testing for graphics drivers 
Description An automated method for testing graphics shader compilers. 
IP Reference  
Protection Protection not required
Year Protection Granted 2017
Licensed Yes
Impact None yet.
 
Title GraphicsFuzz 
Description An automated testing tool for graphics drivers. 
Type Of Technology Software 
Year Produced 2017 
Open Source License? Yes  
Impact The software has led to the formation of GraphicsFuzz, a spinout company, which was subsequently bought by Google. 
URL http://github.com/google/graphicsfuzz
 
Company Name GraphicsFuzz Ltd. 
Description GraphicsFuzz finds graphics driver bugs and helps to quickly fix them. 
Year Established 2017 
Impact The company's technology has found bugs in numerous GPU drivers, which has led to conversations about licensing of the technology. GraphicsFuzz Ltd. was acquired by Google LLC in 2018. The directors of GraphicsFuzz now work as Google UK employees.
Website http://www.graphicsfuzz.com/
 
Description A series of widely-read blog posts 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Industry/Business
Results and Impact A series of technical blog posts designed to promote the activities of my fellowship related to graphics driver fuzzing to a broad industrial audience.
Year(s) Of Engagement Activity 2016,2017,2018
URL https://medium.com/@afd_icl/crashes-hangs-and-crazy-images-by-adding-zero-689d15ce922b