SANDeRS: Smart, Adaptive Compilation for Dark Silicon
Lead Research Organisation:
Lancaster University
Department Name: Computing & Communications
Abstract
We live in an era of multi-cores: computing processors are no longer marketed by their clock speeds, they are marked by the number of cores. The fundamental limits of energy and power density of processors will soon push us further into an age of dark-silicon where only a small portion of the chip can be powered at any time. In such a setting, putting more of the same processing cores on a chip (i.e. homogeneity) gives no advantage. This has forced computer architects to introduce heterogeneous many-core systems built around distinct processors -- which have different energy and performance characteristics and each is specialised for a certain class of applications. Computer architects now hope that software will find ways to unlock the potential of heterogeneous many-cores. Software developers, however, are struggling to cope with this dramatic increase in complexity; and the current compiler tools, whose role is to enable software makes effective use of the underlying hardware, are simply inadequate to the task.
It is already a daunting task to build optimising compilers for homogeneous multi-cores consisting of identical cores, even just targeting performance (i.e. to make programs faster). It typically takes several generations of a compiler to start to effectively exploit the processor's potential, by which time a new processor appears and the process starts again. It will be a fundamentally more difficult task to design efficient compiler heuristics for optimising energy (i.e. to reduce energy consumption) and performance on heterogeneous many-cores, especially given the subtle interactions of different cores and inter-connections. Even if successfully achieved, the task of compiler design must likely to be started again when moving to a new released processor. This never ending game of catch-up inevitably delays time to market, meaning that we rarely fully exploit the hardware in its lifetime. If no solution is found, we will be faced with software stagnation and will be unable to offer scalable computing performance -- a driving force that has dramatically changed our society over the past 50 years.
What is needed is an approach that evolves and adapts to the future hardware architectural change and delivers scalable performance over hardware generations. This project offers precisely that. It will achieve this by bringing together two distinct areas of computer science: parallel compiler design and machine learning to develop a new paradigm for energy and performance optimisation. Our key insight is that the best optimisation strategies can be learned from similar software/hardware settings; and the learnt knowledge can be constantly refreshed without human involvement. This project will deliver such a smart, adaptive compilation system. We will use machine learning to acquire knowledge of workloads, applications and the underlying hardware, testing new compilation strategies, learning how each individual program should be optimised for each specific computing environment, and constantly improving the optimisation heuristics over time.
As knowledge of the application environment grows, our system will make programs faster and more energy efficient; for example, software will respond quicker and the battery life will last longer on mobile phones. It will reduce time to market for software products and deliver scalable performance as hardware advances. If successful, such as programme of work will help to the looming software crisis of dark silicon, which will be of benefit to academics and UK industry, and system software researchers and developers worldwide.
It is already a daunting task to build optimising compilers for homogeneous multi-cores consisting of identical cores, even just targeting performance (i.e. to make programs faster). It typically takes several generations of a compiler to start to effectively exploit the processor's potential, by which time a new processor appears and the process starts again. It will be a fundamentally more difficult task to design efficient compiler heuristics for optimising energy (i.e. to reduce energy consumption) and performance on heterogeneous many-cores, especially given the subtle interactions of different cores and inter-connections. Even if successfully achieved, the task of compiler design must likely to be started again when moving to a new released processor. This never ending game of catch-up inevitably delays time to market, meaning that we rarely fully exploit the hardware in its lifetime. If no solution is found, we will be faced with software stagnation and will be unable to offer scalable computing performance -- a driving force that has dramatically changed our society over the past 50 years.
What is needed is an approach that evolves and adapts to the future hardware architectural change and delivers scalable performance over hardware generations. This project offers precisely that. It will achieve this by bringing together two distinct areas of computer science: parallel compiler design and machine learning to develop a new paradigm for energy and performance optimisation. Our key insight is that the best optimisation strategies can be learned from similar software/hardware settings; and the learnt knowledge can be constantly refreshed without human involvement. This project will deliver such a smart, adaptive compilation system. We will use machine learning to acquire knowledge of workloads, applications and the underlying hardware, testing new compilation strategies, learning how each individual program should be optimised for each specific computing environment, and constantly improving the optimisation heuristics over time.
As knowledge of the application environment grows, our system will make programs faster and more energy efficient; for example, software will respond quicker and the battery life will last longer on mobile phones. It will reduce time to market for software products and deliver scalable performance as hardware advances. If successful, such as programme of work will help to the looming software crisis of dark silicon, which will be of benefit to academics and UK industry, and system software researchers and developers worldwide.
Planned Impact
The immediate beneficiaries of this work will be computing systems and systems software providers, users of data analytics applications and smartphones. The academic beneficiaries will be researchers in the areas of programming languages, compilers, operating systems and computer architecture. We will also train postgraduate and undergraduate students.
We identify 11 activities through that the potential industrial, academic, economic and societal impact of this work will be realised.
[A. Industrial Impact]
*1. Prototypes
This work will develop system software tool-chains for heterogeneous computing, including workload profiling and program synthesis tools, heterogeneous many-core compiler heuristics and a continuous optimisation framework. These will be released under an open source license and used as demonstrators of ideas and potential.
*2. Industrial Engagement:
We will visit our industrial partners and encourage our PhD student to take up internships with our partners to deliver technology transfer.
*3. Industrial Workshop:
At the second year of this project, we will organise a workshop in conjunction with the annual HiPEAC conference to disseminate the results to International Computing Systems industry. The PI has already successfully organised one such industrial workshop in HiPEAC.
*4. Technology Licensing:
IP for heterogeneous many-core software development tools are viable paths for commercial exploitation through technology licensing. The Business Partnerships \& Enterprise Team (BPET) in Lancaster provides commercialisation services to university members. We will make full use of BPET for exploitation.
[B. Economic Impact]
We will work closely with our partners to realise the potential economic impact on two specific areas:
*5. Big Data Analytics:
The results of this work can help to improve the energy efficiency and performance for big-data analytics technologies for big data applications which are currently a $16 bn market. We will work with Freescale, Herta Security and the Barcelona Supercomputing Center to exploit the results in this direction.
*6. Energy-efficient Mobile Devices:
Battery life is a major concern to billions of mobile users who often find their phone has died at most inconvenient times. The techniques developed in this work can improve energy efficiency and performance for each user's mobile device. We will collaborate with our partners, Movidius and CodePlay, to exploit the energy-aware compilation techniques created in this work for mobile computing.
[C. Academic Impact]
*7. Publications:
We aim to publish our results in the best conferences and journals in the areas of computing systems research, compilation, and parallel computing (PLDI, CGO, PPoPP, PACT, HiPEAC, LCTES, ASPLOS, ICS, ACM TACO, ACM TOPLAS, IEEE TPDS). Whenever possible, publications and research results will be made available on the project web site.
*8. Demos, Tutorials and Workshop:
Research prototypes will be disseminated to academic collaborators by giving tutorials and platform demonstrations in major technical conferences. We will continue the highly successful COSMIC (Code OptimiSation for MultI and many Cores) international workshop where we will present the key results of our work.
*9. Academic Visits:
We will regularly visit other systems groups in the UK working on relevant topics to disseminate research findings and build up collaboration.
[D. Societal Impact]
*10. Public Engagement:
We will use the web, and social and news media for public engagement. The project will use CompuCast (the world's first and only podcast for computer scientists) that was co-founded by the PI to engage with a wider audience.
*11. Student Training:
We will design projects for postgraduate and undergraduate students in areas within the project's research agenda, providing students with much needed skills in software development for heterogene
We identify 11 activities through that the potential industrial, academic, economic and societal impact of this work will be realised.
[A. Industrial Impact]
*1. Prototypes
This work will develop system software tool-chains for heterogeneous computing, including workload profiling and program synthesis tools, heterogeneous many-core compiler heuristics and a continuous optimisation framework. These will be released under an open source license and used as demonstrators of ideas and potential.
*2. Industrial Engagement:
We will visit our industrial partners and encourage our PhD student to take up internships with our partners to deliver technology transfer.
*3. Industrial Workshop:
At the second year of this project, we will organise a workshop in conjunction with the annual HiPEAC conference to disseminate the results to International Computing Systems industry. The PI has already successfully organised one such industrial workshop in HiPEAC.
*4. Technology Licensing:
IP for heterogeneous many-core software development tools are viable paths for commercial exploitation through technology licensing. The Business Partnerships \& Enterprise Team (BPET) in Lancaster provides commercialisation services to university members. We will make full use of BPET for exploitation.
[B. Economic Impact]
We will work closely with our partners to realise the potential economic impact on two specific areas:
*5. Big Data Analytics:
The results of this work can help to improve the energy efficiency and performance for big-data analytics technologies for big data applications which are currently a $16 bn market. We will work with Freescale, Herta Security and the Barcelona Supercomputing Center to exploit the results in this direction.
*6. Energy-efficient Mobile Devices:
Battery life is a major concern to billions of mobile users who often find their phone has died at most inconvenient times. The techniques developed in this work can improve energy efficiency and performance for each user's mobile device. We will collaborate with our partners, Movidius and CodePlay, to exploit the energy-aware compilation techniques created in this work for mobile computing.
[C. Academic Impact]
*7. Publications:
We aim to publish our results in the best conferences and journals in the areas of computing systems research, compilation, and parallel computing (PLDI, CGO, PPoPP, PACT, HiPEAC, LCTES, ASPLOS, ICS, ACM TACO, ACM TOPLAS, IEEE TPDS). Whenever possible, publications and research results will be made available on the project web site.
*8. Demos, Tutorials and Workshop:
Research prototypes will be disseminated to academic collaborators by giving tutorials and platform demonstrations in major technical conferences. We will continue the highly successful COSMIC (Code OptimiSation for MultI and many Cores) international workshop where we will present the key results of our work.
*9. Academic Visits:
We will regularly visit other systems groups in the UK working on relevant topics to disseminate research findings and build up collaboration.
[D. Societal Impact]
*10. Public Engagement:
We will use the web, and social and news media for public engagement. The project will use CompuCast (the world's first and only podcast for computer scientists) that was co-founded by the PI to engage with a wider audience.
*11. Student Training:
We will design projects for postgraduate and undergraduate students in areas within the project's research agenda, providing students with much needed skills in software development for heterogene
Organisations
- Lancaster University (Lead Research Organisation)
- Advanced Micro Devices (AMD) (Collaboration)
- Peking University (Collaboration)
- Intel (Ireland) (Project Partner)
- Freescale Semiconductor Uk Ltd (Project Partner)
- Codeplay (United Kingdom) (Project Partner)
- Barcelona Supercomputing Center (Project Partner)
- Herta Security (Project Partner)
- Critical Blue Ltd (Project Partner)
Publications
Tang Z
(2018)
VMGuards: A Novel Virtual Machine Based Code Protection System with VM Security as the First Class Design Concern
in Applied Sciences
Tang Z
(2017)
Exploiting Wireless Received Signal Strength Indicators to Detect Evil-Twin Attacks in Smart Homes
in Mobile Information Systems
Taylor B
(2018)
Adaptive deep learning model selection on embedded systems
Wang Z
(2018)
Machine Learning in Compiler Optimization
in Proceedings of the IEEE
Ye G
(2018)
A Video-based Attack for Android Pattern Lock
in ACM Transactions on Privacy and Security
Ye G
(2018)
Yet Another Text Captcha Solver
Description | We have shown that energy-optimisation on heterogeneous many-core systems are non-trivial, but if we can make the right choice, the benefit will be significant. We have shown that compiler plays a key role in power and performance optimisations on heterogeneous archietctures. We have developed a tool based on the LLVM compiler infrastructrue and show that by correctly optimising the program, we can achieve up to 3x speedup or 2x performance reduction over the standard compiler setting on a heterogeneous CPU-GPU mixed platform. We have shown that by optimizing and scheduling the code in different ways different performance and energy trade-offs can be achieved on heterogeneous multi-core architectures. This demonstrates that compiler-based techniques can play a key role in performing energy and performance optimizations for heterogeneous multi- and many-core systems. We are among the first to show that deep learning can be used to replace compiler heuristics, leading to far better performance on parallel GPGPU programs. |
Exploitation Route | We have made our tool public available on github: https://github.com/zwang4/dividend. We have also published our results in over 10 papers from which the research community can benefit from our key finding. |
Sectors | Digital/Communication/Information Technologies (including Software) Energy Other |
Description | Our work on code size reduction was licensed to a RISC-V processor IP company and is being producised by a major IT company. |
First Year Of Impact | 2019 |
Sector | Digital/Communication/Information Technologies (including Software) |
Impact Types | Economic |
Description | EPSRC iCASE Studentship |
Amount | £35,000 (GBP) |
Organisation | Arm Limited |
Sector | Private |
Country | United Kingdom |
Start | 01/2016 |
End | 06/2019 |
Description | Royal Society |
Amount | £12,000 (GBP) |
Organisation | The Royal Society |
Sector | Charity/Non Profit |
Country | United Kingdom |
Start | 03/2017 |
End | 03/2019 |
Title | DeepTune - a deep learning based compiler optimisaiton tool |
Description | DeepTune is an open-source framework for building compiler optimisation heuristics using deep learning techniques. DeepTune uses a deep neural network that learns heuristics over raw code, entirely without using code features. The neural network simultaneously constructs appropriate representations of the code and learns how best to optimize, removing the need for manual feature creation. |
Type Of Material | Improvements to research infrastructure |
Year Produced | 2017 |
Provided To Others? | Yes |
Impact | DeepTune is the world's first deep-learning-based autotuner for compiler heuristics. It opens up a new research field for using deep learning to model program structures for performance optimisation. A range of follow up works have built upon DeepTune. It also helps to secure follow-up industrial funding for over £500K. |
URL | https://github.com/ChrisCummins/paper-end2end-dl |
Title | HSA auto-tuning framework |
Description | A compiler-based auto-tuning tool for HSA applications. It is the first automatic tool for tuning HAS applications. |
Type Of Material | Improvements to research infrastructure |
Year Produced | 2016 |
Provided To Others? | Yes |
Impact | There are two research groups (the project partners), Albert Cohen at Inria France, and Alexandru Amaricai from Politehnica University of Timi?oara, Romaina are using our tool |
URL | https://github.com/zwang4/dividend |
Description | Collaboration with Dionasys |
Organisation | Peking University |
Department | School of Electronics Engineering and Computer Science |
Country | China |
Sector | Academic/University |
PI Contribution | We are collaborating on a collaboration project funded by the Royal Society. The project mines opensource repositories like github to automatically detect bugs and generate fixings. The Lancaster team contributes to the project on compiler and code analysis expertise. |
Collaborator Contribution | The Peking university team contributes staff time and expertise on natural language processing to the project. |
Impact | The project just started and no outcome were generated yet. |
Start Year | 2017 |
Description | Collaboration with Peking University |
Organisation | Peking University |
Department | School of Electronics Engineering and Computer Science |
Country | China |
Sector | Academic/University |
PI Contribution | We are working on a joint project to mine the open sourced projects from github to detect and repair bugs. We contribute our expertise on code analysis to the project. |
Collaborator Contribution | The collaborative partner contributes their expertise on natural language processing to the project. The partner team involves two academics and three postgraduate students. |
Impact | This collaborative work has led to two joint publications: (DOI: 0.18653/v1/P17-1040 and Scale Up Event Extraction Learning via Automatic Training Data Generation). |
Start Year | 2017 |
Description | HSA collaboration with AMD |
Organisation | Advanced Micro Devices (AMD) |
Country | United States |
Sector | Private |
PI Contribution | This work has led to a collaboration with AMD who is a main contributor of the Heterogeneous System Architecture (HSA) Foundation. We are currently working on building a compiler-based HSA auto-tuner for the LLVM HSAIL compiler developed by AMD. |
Collaborator Contribution | AMD has gave us access to their internal version of the HSA driver and provide technical support to their HSA architecture. |
Impact | This has led to a prototype HSA auto-tuner released on github: https://github.com/zwang4/dividend |
Start Year | 2016 |
Title | HSA Auto-tuning tool |
Description | A compiler-based auto-tuning tool for HSA applications. |
Type Of Technology | Software |
Year Produced | 2016 |
Open Source License? | Yes |
Impact | The first auto-tuning tool for HSA programs. |
URL | https://github.com/zwang4/dividend |
Description | NDSS paper |
Form Of Engagement Activity | A magazine, newsletter or online publication |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Public/other audiences |
Results and Impact | Our research into Android Pattern Lock security has received wide media coverage. The news appeared in most UK national newspapers and was reported on by media outlets around the world to a potential audience of millions (as reported by the press office at Lancaster University) |
Year(s) Of Engagement Activity | 2016 |
URL | http://www.thetimes.co.uk/edition/news/scientists-finger-security-flaw-on-smartphone-lock-dmql3hdp3 |