Energy Proportional Computing With Heterogeneous and Reconfigurable Processors (ENPOWER)

Lead Research Organisation: University of Bristol
Department Name: Electrical and Electronic Engineering


Energy efficiency is one of the primary design constraints for modern processing systems. Limited battery life and excessive internal power densities limit the number of transistors that can be active simultaneous in a silicon chip. Energy and power reduction in conventional computing is limited by the inability of modifying the architecture or adapting to changes in the fabrication process, temperature or application requirements after chip fabrication. When these changes are possible are limited by the need of "margining" that introduces safety margins so devices operate under worst conditions. Worst conditions are rarely the case an important energy and performance gains are possible if technology can adapt to the real conditions of operation. This research addresses this challenge by investigating energy proportional computing with a novel voltage, frequency and logic scaling triplet to adapt to changes in applications, fabrication or operating conditions. The results from this research are expected to deliver new fundamental insights to the question of: How future computers can obtain orders of magnitude higher performance with limited energy budgets?

Planned Impact

The potential beneficiaries of this research are the electronics and semiconductor companies involved in the creation of the multi-core processing platforms that will be at the centre of future super-computing devices. A good indication of the challenges that this industry faces over the next 20 years is available in the International Technology Roadmap for Semiconductors (ITRS). This report identified energy as a fundamental challenge that future integrated circuits will face. It is expected that the process variability in deep submicron technologies will make designing chips assuming worst case conditions wasteful and uneconomical.
The objective set by the IRTS in terms of energy requirements is to maintain the static and dynamic power at current or decreasing levels despite the exponential growth in logic complexity and throughput. Self-regulating processing cores able to control their own voltage supplies, activation periods, clock frequencies, active logic etc will be needed since the effects of scaling and variability will add up increasing the problems of power density and leakage. The proposed technology is highly relevant to servers dealing with offline analytics, web applications and large data sets applications in areas, including meteorology, seismic, genomics, complex physics simulations. etc. These applications offer high level of parallelism so that multiple cores can work on the problem at a time but the power requirements of these large collections of server processors becomes very high. Kernel acceleration using FPGA fabrics able to compute at the limit could offer power reductions of orders of magnitude. To make sure that the knowledge and know-how permeates adequately into the industry environment the following actions will be taken:
1. The hardware demonstrators will be presented with a series of visits to the industrial collaborators Qioptiq and Xilinx and other companies working in the areas of interest such as Maxeler and ESA. These visits will be organized at key stages of the project to ensure that adequate feedback is received. Short secondments of academic staff to industry will be arranged together with the visits. Additional contacts will be made with companies developing server systems around ARM technology (e.g. Calxeda) to demonstrate the potential improvements in energy of the approach. The technology could also extend the design flows of companies targeting high-performance computing using FPGAs such as Maxeler. The authors already have a working relationship with Maxeler and this will be used to demonstrate the technology on a Maxeler acceleration system.
2. In the EACO workshops we intend to organise a series of seminars/demonstrations to bring the project in contact with industry and academia. We believe that an interactive demonstration around a multimedia application in the area of video processing should be able to attract plenty of general interest.
3. The traditional avenues of journal and conference publications will be also fully utilized. We have identified conferences such as DATE (Design and Test in Europe), FPL (Field Programmable Logic), DAC (Design Automation Conference) as high quality conferences adequate for this work. Journals such as IEEE TVLSI, IEEE Computers and IET CDT will be targeted.
The expected impacts of this research can be summarised as:
1. To understand how available chips can compute to the limit by tracking the optimal hardware and software configuration.
2. To deliver one order of magnitude better energy/performance operating exploiting a scalability triplet formed by logic, voltage and frequency operating at the limit of reliability.
3. To show how this approach can be applied to high-performance and embedded computing systems used in industrial applications.
Description ENPOWER (Elastic and Nonstationary POWER) investigated energy proportional computing techniques for reconfigurable devices that include an FPGA fabric and embedded microprocessors. The main discovery was deployment of automatic fine grain control of the FPGA parameters that impact power and energy requirements. The design paradigm includes full scalability (i.e. capacitance, voltage and frequency) in a variation-aware, closed-loop configuration which is exposed to, and can be changed by the application software.
Exploitation Route The outputs for this grant are currently being applied in a European research grant Teamplay New applications with RISCV processors and Intel accelerators are now being pursued. The outputs were also used in an Innovate/TSB UK research grant "Energy-Efficient Image Processing for Man Portable Multi-Waveband Sensor Fusion" with industrial partner Qioptiq with more information available at and At the moment the adaptive voltage scaling methodology is being applied to Quantum computing as a way to reduce the power dissipation of an control electronic system embedded in a Cryostat. Initial results are promising and it is expected that further funding be requested to support this line of research.
Sectors Aerospace, Defence and Marine,Electronics,Energy

Description The systems has been demonstrated in a portable application that performs video fusion needs to operate at a low energy point. This is a defense application and the results have been published in P. Sun, A. Achim, I. Hasler, P. Hill and J. Nunez-Yanez, "Energy efficient video fusion with heterogeneous CPU-FPGA devices," 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE), Dresden, 2016, pp. 1399-1404. The project partner is Qioptiq which is a defence company based in the UK . At the moment we are investigating together with the Quantum group at the University of Bristol how the instrumented FPGAs with reduced energy can be made part of a cryostat that control a Quantum photonic circuit. Initial results are promising and we are in the process of requesting further funding to take this novel concept further. We are also working with the UK division of Sensata LTD to analyse how these energy efficient techniques can be applied to low-latency deep learning supported by the Royal Society fellowship MINET.
First Year Of Impact 2016
Sector Aerospace, Defence and Marine
Impact Types Economic

Description ARM 
Organisation Arm Limited
Country United Kingdom 
Sector Private 
PI Contribution CASE award to develop power models for ARM big.LITTLE microprocessors
Collaborator Contribution inter ships, engineering support
Impact publications, PhD students.
Start Year 2012
Description DSTL 
Organisation Defence Science & Technology Laboratory (DSTL)
Country United Kingdom 
Sector Public 
PI Contribution Video super-resolution algorithms and hardware systems for low power and low cost sensors
Collaborator Contribution data sets
Impact publications, technology transfer to the company for further development
Start Year 2013
Description University of Malaga 
Organisation University of Malaga
Country Spain 
Sector Academic/University 
PI Contribution low-power FPGA technology
Collaborator Contribution Algorithms for video analysis and OpenCL implementation
Impact We are writing publications on the topic that will be available soon
Start Year 2015
Description oxford 
Organisation University College Oxford
Country United Kingdom 
Sector Academic/University 
PI Contribution Evaluation of OpenCL for energy proportional FPGA targets
Collaborator Contribution benchmarks
Impact Outputs not yet available
Start Year 2014
Description qioptiq 
Organisation Qioptic
Country Germany 
Sector Private 
PI Contribution Create video low-power fusion system using FPGAs and ARM microprocessors
Collaborator Contribution Equipment, interface IP cores, cameras
Impact a product to be commercialized is under development.
Start Year 2012
Description Evaluation of Heterogeneous execution on an HPC-oriented CPU-FPGA System-on-Chip. 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Current HPC systems require highly specialized hardware to continue improving performance because the gains offered by technology scaling have significantly reduced. One promising approach for specialization is the integration of CPUs and FPGAs in the same socket, so programmers can write highly optimized kernels, which deliver excellent performance. However, following this approach requires to overcome several obstacles. First, programming FPGA with hardware description languages is very challenging and error prone, and, second, maximizing the utilization of all the computing power of CPU and FPGA devices requires high level programming frameworks that help with the burden of scheduling the work and managing the data. This work presents an analysis of a C++ template based framework enabling programmers to run OpenCL code on any heterogeneous platform, including Intel HARP, with ease. First, it goes over the hardware platform and the interconnection network between the devices, since performance largely depends on them. Second, the talk comments on how High Level Synthesis, HLS, tools can be the substrate for the heterogenous framework and briefly overlooks the usual preference of FPGAs for very deep pipelines with single-task kernels over the single-instruction multiple-thread model of GPUs.
Year(s) Of Engagement Activity 2019
Description Heterogeneous FPGA+GPU Embedded Systems: Challenges and Opportunities 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact The edge computing paradigm has emerged to handle cloud computing issues such as scalability, security and low response time. This new computing trend heavily relies on ubiquitous embedded systems on the edge. Performance and energy consumption are two main factors that should be considered during the design of such systems. Focusing on performance and energy consumption,this paper studies the opportunities and challenges that a heterogeneous embedded system consisting of embedded FPGAs and GPUs (as accelerators) can provide for applications. We study three design,modeling and scheduling challenges throughout the paper. We also propose three techniques to cope with these three challenges. Applying the proposed techniques to three applications including image histogram, dense matrix-vector multiplication and sparse matrix-vector multiplications show 1.79x and 2.29x improvements in performance and energy consumption, respectively, when both FPGA and GPU execute the corresponding application in parallel.
Year(s) Of Engagement Activity 2019
Description Special journal issue with the IET in energy efficient low power computing and workshop in the HIPEAC conference 
Form Of Engagement Activity A magazine, newsletter or online publication
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other academic audiences (collaborators, peers etc.)
Results and Impact High quality papers published, interesting discussions with peers and colleagues, plans for future grant applications.

increase the presence of the Bristol group at an international level and the importance of the topic.
Year(s) Of Engagement Activity 2014