SANDeRS: Smart, Adaptive Compilation for Dark Silicon

Lead Research Organisation: Lancaster University
Department Name: Computing & Communications

Abstract

We live in an era of multi-cores: computing processors are no longer marketed by their clock speeds, they are marked by the number of cores. The fundamental limits of energy and power density of processors will soon push us further into an age of dark-silicon where only a small portion of the chip can be powered at any time. In such a setting, putting more of the same processing cores on a chip (i.e. homogeneity) gives no advantage. This has forced computer architects to introduce heterogeneous many-core systems built around distinct processors -- which have different energy and performance characteristics and each is specialised for a certain class of applications. Computer architects now hope that software will find ways to unlock the potential of heterogeneous many-cores. Software developers, however, are struggling to cope with this dramatic increase in complexity; and the current compiler tools, whose role is to enable software makes effective use of the underlying hardware, are simply inadequate to the task.

It is already a daunting task to build optimising compilers for homogeneous multi-cores consisting of identical cores, even just targeting performance (i.e. to make programs faster). It typically takes several generations of a compiler to start to effectively exploit the processor's potential, by which time a new processor appears and the process starts again. It will be a fundamentally more difficult task to design efficient compiler heuristics for optimising energy (i.e. to reduce energy consumption) and performance on heterogeneous many-cores, especially given the subtle interactions of different cores and inter-connections. Even if successfully achieved, the task of compiler design must likely to be started again when moving to a new released processor. This never ending game of catch-up inevitably delays time to market, meaning that we rarely fully exploit the hardware in its lifetime. If no solution is found, we will be faced with software stagnation and will be unable to offer scalable computing performance -- a driving force that has dramatically changed our society over the past 50 years.

What is needed is an approach that evolves and adapts to the future hardware architectural change and delivers scalable performance over hardware generations. This project offers precisely that. It will achieve this by bringing together two distinct areas of computer science: parallel compiler design and machine learning to develop a new paradigm for energy and performance optimisation. Our key insight is that the best optimisation strategies can be learned from similar software/hardware settings; and the learnt knowledge can be constantly refreshed without human involvement. This project will deliver such a smart, adaptive compilation system. We will use machine learning to acquire knowledge of workloads, applications and the underlying hardware, testing new compilation strategies, learning how each individual program should be optimised for each specific computing environment, and constantly improving the optimisation heuristics over time.

As knowledge of the application environment grows, our system will make programs faster and more energy efficient; for example, software will respond quicker and the battery life will last longer on mobile phones. It will reduce time to market for software products and deliver scalable performance as hardware advances. If successful, such as programme of work will help to the looming software crisis of dark silicon, which will be of benefit to academics and UK industry, and system software researchers and developers worldwide.

Planned Impact

The immediate beneficiaries of this work will be computing systems and systems software providers, users of data analytics applications and smartphones. The academic beneficiaries will be researchers in the areas of programming languages, compilers, operating systems and computer architecture. We will also train postgraduate and undergraduate students.

We identify 11 activities through that the potential industrial, academic, economic and societal impact of this work will be realised.

[A. Industrial Impact]

*1. Prototypes
This work will develop system software tool-chains for heterogeneous computing, including workload profiling and program synthesis tools, heterogeneous many-core compiler heuristics and a continuous optimisation framework. These will be released under an open source license and used as demonstrators of ideas and potential.

*2. Industrial Engagement:
We will visit our industrial partners and encourage our PhD student to take up internships with our partners to deliver technology transfer.

*3. Industrial Workshop:
At the second year of this project, we will organise a workshop in conjunction with the annual HiPEAC conference to disseminate the results to International Computing Systems industry. The PI has already successfully organised one such industrial workshop in HiPEAC.

*4. Technology Licensing:
IP for heterogeneous many-core software development tools are viable paths for commercial exploitation through technology licensing. The Business Partnerships \& Enterprise Team (BPET) in Lancaster provides commercialisation services to university members. We will make full use of BPET for exploitation.

[B. Economic Impact]
We will work closely with our partners to realise the potential economic impact on two specific areas:

*5. Big Data Analytics:
The results of this work can help to improve the energy efficiency and performance for big-data analytics technologies for big data applications which are currently a $16 bn market. We will work with Freescale, Herta Security and the Barcelona Supercomputing Center to exploit the results in this direction.

*6. Energy-efficient Mobile Devices:
Battery life is a major concern to billions of mobile users who often find their phone has died at most inconvenient times. The techniques developed in this work can improve energy efficiency and performance for each user's mobile device. We will collaborate with our partners, Movidius and CodePlay, to exploit the energy-aware compilation techniques created in this work for mobile computing.

[C. Academic Impact]

*7. Publications:
We aim to publish our results in the best conferences and journals in the areas of computing systems research, compilation, and parallel computing (PLDI, CGO, PPoPP, PACT, HiPEAC, LCTES, ASPLOS, ICS, ACM TACO, ACM TOPLAS, IEEE TPDS). Whenever possible, publications and research results will be made available on the project web site.

*8. Demos, Tutorials and Workshop:
Research prototypes will be disseminated to academic collaborators by giving tutorials and platform demonstrations in major technical conferences. We will continue the highly successful COSMIC (Code OptimiSation for MultI and many Cores) international workshop where we will present the key results of our work.

*9. Academic Visits:
We will regularly visit other systems groups in the UK working on relevant topics to disseminate research findings and build up collaboration.

[D. Societal Impact]

*10. Public Engagement:
We will use the web, and social and news media for public engagement. The project will use CompuCast (the world's first and only podcast for computer scientists) that was co-founded by the PI to engage with a wider audience.

*11. Student Training:
We will design projects for postgraduate and undergraduate students in areas within the project's research agenda, providing students with much needed skills in software development for heterogene
 
Description We have shown that energy-optimisation on heterogeneous many-core systems are non-trivial, but if we can make the right choice, the benefit will be significant. We have shown that compiler plays a key role in power and performance optimisations on heterogeneous archietctures. We have developed a tool based on the LLVM compiler infrastructrue and show that by correctly optimising the program, we can achieve up to 3x speedup or 2x performance reduction over the standard compiler setting on a heterogeneous CPU-GPU mixed platform. We have shown that by optimizing and scheduling the code in different ways different performance and energy trade-offs can be achieved on heterogeneous multi-core architectures. This demonstrates that compiler-based techniques can play a key role in performing energy and performance optimizations for heterogeneous multi- and many-core systems. We are among the first to show that deep learning can be used to replace compiler heuristics, leading to far better performance on parallel GPGPU programs.
Exploitation Route We have made our tool public available on github: https://github.com/zwang4/dividend.

We have also published our results in over 10 papers from which the research community can benefit from our key finding.
Sectors Digital/Communication/Information Technologies (including Software),Energy,Other

 
Description EPSRC iCASE Studentship
Amount £35,000 (GBP)
Organisation ARM Holdings 
Sector Private
Country United Kingdom
Start 01/2016 
End 06/2019
 
Description Royal Society
Amount £12,000 (GBP)
Organisation The Royal Society 
Sector Academic/University
Country United Kingdom
Start 03/2017 
End 03/2019
 
Title HSA auto-tuning framework 
Description A compiler-based auto-tuning tool for HSA applications. It is the first automatic tool for tuning HAS applications. 
Type Of Material Improvements to research infrastructure 
Year Produced 2016 
Provided To Others? Yes  
Impact There are two research groups (the project partners), Albert Cohen at Inria France, and Alexandru Amaricai from Politehnica University of Timi?oara, Romaina are using our tool 
URL https://github.com/zwang4/dividend
 
Description Collaboration with Dionasys 
Organisation Peking University
Department School of Electronics Engineering and Computer Science
Country China 
Sector Academic/University 
PI Contribution We are collaborating on a collaboration project funded by the Royal Society. The project mines opensource repositories like github to automatically detect bugs and generate fixings. The Lancaster team contributes to the project on compiler and code analysis expertise.
Collaborator Contribution The Peking university team contributes staff time and expertise on natural language processing to the project.
Impact The project just started and no outcome were generated yet.
Start Year 2017
 
Description Collaboration with Peking University 
Organisation Peking University
Department School of Electronics Engineering and Computer Science
Country China 
Sector Academic/University 
PI Contribution We are working on a joint project to mine the open sourced projects from github to detect and repair bugs. We contribute our expertise on code analysis to the project.
Collaborator Contribution The collaborative partner contributes their expertise on natural language processing to the project. The partner team involves two academics and three postgraduate students.
Impact This collaborative work has led to two joint publications: (DOI: 0.18653/v1/P17-1040 and Scale Up Event Extraction Learning via Automatic Training Data Generation).
Start Year 2017
 
Description HSA collaboration with AMD 
Organisation Advanced Micro Devices (AMD)
Country United States 
Sector Private 
PI Contribution This work has led to a collaboration with AMD who is a main contributor of the Heterogeneous System Architecture (HSA) Foundation. We are currently working on building a compiler-based HSA auto-tuner for the LLVM HSAIL compiler developed by AMD.
Collaborator Contribution AMD has gave us access to their internal version of the HSA driver and provide technical support to their HSA architecture.
Impact This has led to a prototype HSA auto-tuner released on github: https://github.com/zwang4/dividend
Start Year 2016
 
Title HSA Auto-tuning tool 
Description A compiler-based auto-tuning tool for HSA applications. 
Type Of Technology Software 
Year Produced 2016 
Open Source License? Yes  
Impact The first auto-tuning tool for HSA programs. 
URL https://github.com/zwang4/dividend
 
Description NDSS paper 
Form Of Engagement Activity A magazine, newsletter or online publication
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact Our research into Android Pattern Lock security has received wide media coverage. The news appeared in most UK national newspapers and was reported on by media outlets around the world to a potential audience of millions (as reported by the press office at Lancaster University)
Year(s) Of Engagement Activity 2016
URL http://www.thetimes.co.uk/edition/news/scientists-finger-security-flaw-on-smartphone-lock-dmql3hdp3