Abstraction-Level Energy Accounting and Optimisation in Many-core Programming Languages

Lead Research Organisation: Queen's University of Belfast
Department Name: Electronics Electrical Eng and Comp Sci

Abstract

Energy efficiency is becoming increasingly important in today's world of battery powered mobile devices and power limited servers. While
performance optimisation is a familiar topic for developers, few are even aware of the effects that source code changes will have on the
energy profiles of their programs. Without knowledge of these effects, compiler and operating system writers cannot create automatic energy
optimisers. To realise the needed energy savings, we require the capability to track energy consumption and associate it to code
and data at a fine granularity. Furthermore, compilers and operating systems must exploit this capability to optimise applications
automatically.

This proposal presents a novel approach to software-centric modelling, measurement, accounting and optimisation of energy-efficiency on
many-core systems. Energy consumption will be matched against programming language abstractions, from basic-blocks to functions,
loops, and parallel constructs, and from variables to data structures, providing developers with the information that they need. The project will use this fine grained accounting to build novel compiler optimisations that target energy consumption. It will create low energy runtime systems that adapt to environmental changes. It will develop energy efficient operating system scheduling that manages multi-tasking for heterogeneous many-cores. The project aims to improve performance per Watt by at least 40%.

Planned Impact

We aim to achieve impact through excellence and collaboration.
We identify several activities that realise potential industrial,
economic and societal impact.

*Prototypes
The project will develop prototypes of energy-efficient implementations of
parallel programming languages and energy accounting tools, used as demonstrators
of ideas and potential. Prototyping and
demonstration will maximise our opportunities for industrial engagement
in technology testing and transfer.

*Industrial engagement
Partnerships with ARM, IBM, BluWireless Technology, Freescale Semiconductor, and Herta Security will
provide us with access to in-house software and hardware technology as well as
feedback on the potential exploitation paths of the research
conducted in the project. Through industrial engagement, we will explore options for
licensing technology to industrial partners.

*Spin-out companies
IP for software energy sensors, energy-aware optimising compilers and
energy-aware runtime systems are all viable paths for commercial exploitation by
spin-outs. Both Queen's and Edinburgh have dedicated commercial teams to facilitate and
manage all technology exploitation and transfer opportunities (detailed in the
Pathways to Impact Section).

*Industrial workshops
The project will organise workshops in conjunction with the annual
European conference on High Performance and Embedded Architectures and Compilers
(HiPEAC) and major technical conferences. The investigators have already organised a
number of successful international workshops, including a forthcoming
event in CGO'13, and will continue to do so to disseminate the results.

*Economic Impact:
The Computing Systems market underpins approximately 16% of the global economy. Although services generate most of this revenue, technological and product dominance lead to increased service market share. This project will specifically contribute to future UK economic success by transfer of knowledge and technology to system software, an
ICT sector which is strategically important to the UK. Research on energy-efficient servers and mobile systems presents further significant opportunities for UK companies.
Low-carbon datacentres, where the project can contribute technology to reduce server carbon emissions, and smartphone user productivity,
which the project can improve by extending the battery life, in light of emerging wireless
communication technologies with higher data bandwidth, are two examples with high potential economic
impact.

*Societal Impact:
[Web and Social Media Presence] The project will use the web, social media and news media to the academic
community and the general public. Dissemination of the project will be
achieved through distribution of technical papers, news articles, press releases,
and video presentations.

[Student training] The project raises energy awareness in computing systems research and
education and will provide postgraduate students with much needed
skills in understanding, diagnosing and resolving energy inefficiency in computing
systems.

*Collaboration:
The project partnerships provide outstanding opportunities for collaboration with industry and technology licensing.
Generous contributions of equipment, software and human resources from partners, valued at approximately
£190K, provide additional pathways for research and prototyping.

*Capability:
The investigators have an excellent record in
technology transfer and engaging beneficiaries.
In the recent past, the investigators have engaged in
technology transfer activities with IBM, ARM, CriticalBlue, CodePlay,
Intel, Samsung, Synopsys, SAP, Analytics Engines, and ACE.
Public awareness of key research results will also be stimulated through
reports in newspapers, web-based news media, an
 
Description ALEA develops tools that help the users or the operators of a computing system understand how much energy is consumed by the system and how can the observed energy consumption be traced back to the software and applications running on the system. ALEA specifically focuses on attributing power and energy consumption to fine-grain code blocks, with an objective to expose more and amplified energy-aware software optimisation opportunities.

The ALEA technology has a range of applications, from methods to extend battery life in mobile devices based on heterogeneous many-core embedded processors, to methods for reducing the Total Cost of Ownership (TCO) of compute and storage nodes in datacentres.

ALEA's main finding during the first two years of the project included a novel, probabilistic model for attributing power & energy samples to software code blocks, which has enabled accurate measurement and accounting of energy consumption between code abstractions in software, including methods (functions), basic blocks and machine-level instructions. The project has validated the accuracy and correctness of the tool against two state-of-the-art many-core architectures based on Intel IvyBridge and Samsung Exynos substrates.
The ALEA tool was further enhanced with power models and power sampling methods for FPGAs, Xeon Phi accelerators, and ARM Mali GPUs.

ALEA has also explored methods to accurately apportion energy consumption between software activities that share hardware resources, such as secondary cache space, memory and network bandwidth.

The ALEA energy accounting tool has been deployed in several power and energy optimisation use cases, including a study of the energy-efficiency of microservers and scale-out techniques as an alternative to heavy-duty servers, power-capped code optimisation using both resource scaling and software transformations, analysis of the power-efficiency of binary rewriting and thread migration code on heterogeneous many-core architectures and a study of the energy-efficiency of a range of computational science and Big Data applications expressed in an approximate programming model.

In the context of energy-aware program optimisation, ALEA further pursued three directions of research:

The first line of research identified ways of controlling the power and energy consumption of applications through the optimisation options of the compiler. While these options by design have a significant effect on the performance of the application, little was known about the effects on the energy and power consumption. Our work showed that while energy consumption is highly correlated with performance and cannot be controlled independently, power consumption can be effectively and independently controlled through the compiler.

The second line of research developed a platform for creating benchmarks from real users interacting with real mobile applications. Having representative benchmarks is critical for the evaluation of almost all research done in the field of Computer Science and is a necessary requirement for methodologies like iterative compilation and machine learning. Unfortunately it's difficult to create benchmarks for mobile devices since the bulk of applications running on them are interactive applications. By their own nature, interactive applications might display significantly different behaviour from one execution to the next, which means that programmatically replaying one captured, or even worse an artificially created, set of interactions with the device is not enough. Furthermore, quantifying the performance of the application is difficult. Interactive applications are characterised by short bursts of activity followed by long pauses where the application waits for user input. What matters performance-wise is how much time it took to service these short bursts of activity, but identifying and measuring the runtime of these bursts has been done before only when explicit programmatic support was provided for this task by the application itself.

The ALEA benchmarking platform overcomes these problems by capturing both the user input and video from the device screen. By capturing the user input, we are able to fully replicate the way the user interacts with the application. By capturing the display, we can semi-automatically identify what is the system's visible response to each user input event. Combining the two, we can both replay any user interaction with the application and measure the time it takes for the application to complete its response to an input event, thus quantifying the performance of the application. This way, we achieve four targets: a) we create deterministic benchmarks, b) whose performance can be quantified, c) from any interactive application and d) with input coming from real users.

The third line of research focused on the problem of accelerating the construction of the optimisation heuristics that we will use in the rest of our research. Typical approaches to heuristics construction require months of training data collection per platform, which is unacceptably long. During our research we discovered that this process can be sped up considerably. Most of the training data used for building a heuristic provide little additional information, covering parts of the optimisation space for which we already have sufficient knowledge. Our approach instead uses active learning to predict the amount of additional information provided by each point in the optimisation space and collects training data only for the points with the maximum information potential. This way we can collect the same amount of training information for only a fraction of the time, which translates into days or weeks instead of months of training data collection.

In the third year of the project we developed new methods for extremely fine-grained energy profiling. These
methods make it possible to break hardware limitations and accurately estimate energy and power
consumption over periods with latency down to 10 uSec. We implemented a profiling tool based
on the techniques for two distinct architectures: Intel , which is widely used in data centres, and ARM,
which dominates the mobile market. The experimental results show that the proposed techniques
are highly accurate, with the average error of its estimates at 6%.

ALEA opened up previously unexploited opportunities for power-aware optimisation in computing systems.
We explored a several use cases of ALEA which includes energy and power optimisation of both
benchmarks and real applications. Particularly, we discovered that ALEA could be effectively used to
evaluate the impact of performance optimisation on chip energy and power consumption. Using ALEA,
we identified compiler optimisations inflating chip power consumption without a measurable
performance gain. We found how ALEA enables energy savings through fine-grained DVFS in a sequence
alignment algorithm. Finally, our research revealed that ALEA supports targeted energy optimisations of
realistic applications, such as energy reduction by 2.87× for an industrial strength option pricing code
provided by Credit Swiss. Overall, our technology provides engineers a tool to expose opportunities for
energy and power optimisation of a wide range of software.

In the final stages of the project we intend to extend ALEA and enable extremely fine-grained voltage profiling over periods with latency
down to a few nanoseconds. This technology will allow us to identify code regions which induce voltage
drops leading to hardware faults. The voltage drops give rise to the so-called "IR drop" problem, which is
critical since a running application could compromise hardware reliability and security. This technology is
of interest to UK-based semiconductor companies, such as ARM, Imagination Technologies and so on.
Recent research indicates that hardware faults are becoming an inherent property of emerging processor
and memory technologies. Our research shows that the number and sources of these faults depend on
running applications and specific code patterns. We believe that cooperation between hardware and
system software could be used to tolerate and mitigate various hardware errors, including DRAM and chip
errors. ALEA techniques could be extended to profile a new domain - hardware faults. Such profiling will
shed light on the cause of the faults and what code patterns trigger these faults. We envision that the
profiling will help software and hardware engineers to mitigate hardware errors and find an optimal tradeoff
between performance, energy/power consumption and reliability. We predict that this technology will
be highly demanded by leading software companies and data centre service providers in the nearest
future to improve fault resilience and energy efficiency of running software.

ALEA collaboratively produced over 30 research outputs.
Exploitation Route Since 2016 ALEA has been extensively used in studies of the energy-efficiency of experimental servers with hardware that operates outside nominal margins; in measurements of the energy-efficiency of mobile devices and Cloud servers in Edge Computing setups; in understanding the energy-efficiency of micro-servers for real-time data analytics; and in understanding the energy implications of resilience and error mitigation methods in commodity systems. The ALEA toolset has been deployed in several Horizon2020 projects (AllSCale, Vineyard, OPRECOMP, UniServer) and is now in use by more than 30 academic and industrial research groups worldwide. The ALEA tool has been deployed as the a software energy metering substrate by four companies (IBM Research Zurich, Credit Suisse, Analytics Engines Ltd., Neueda Consulting), within the context of the NanoStreams EU project (FP7-610509), as well as by four international research groups (FORTH-ICS, IMEC, EPFL, INRIA) in the context of the NanoStreams and SCoRPiO EU projects (FP7-610509, FP7-323872). The tool has also been widely distributed among academic research groups in the UK and the US.

The aforementioned industry and research groups use the ALEA tool as a method to understand energy consumption patterns and correlate them with components running in their software stacks. The tool has the potential to uncover energy optimisation opportunities for datacentres via interventions in the software stack, such as task consolidation and throttling. This potential is currently under investigation.

ALEA further plans to exploit the results of our research on the effects of compiler optimisations on power consumption by implementing a fine-grained power capping system. This system will enforce power consumption limits by loading and executing alternative executable versions of the same program, each one created by using different compiler optimisation options and each one having a different power consumption, depending on the power cap that we have to enforce. The benefit of doing this instead of using other techniques is the fine grainedness of our approach. The power and performance levels that we can enforce through the compiler optimisation options are apart only a few mWatts in terms of power and less than 1% in terms of performance. On the other hand, techniques like DVFS-based power capping can only support a small number of different power/performance levels and therefore might reduce the performance much more than needed in order to stay below a power consumption target.

Regarding the ALEA benchmark creation platform, our next step will be to use the representative benchmarks of interactive applications created through this methodology in order to train and evaluate the previously mentioned power capping system, as well as complementary novel DVFS governors. The ways that this platform can be put into use by others are countless. Practically every research and engineering project that targets mobile devices and incorporates benchmark-based evaluation into its workflow will benefit from using our platform.
Sectors Digital/Communication/Information Technologies (including Software),Electronics,Energy

URL http://www.eeecs.qub.ac.uk/ALEA/
 
Description The ALEA energy accounting toolset now underpins research and innovation in the field of energy-efficient computing paradigms of several international research consortia, including the European the OPRECOMP consortium on transprecision computing, the European Vineyard consortium which explores methods to deploy accelerators in heterogeneous datacenters, and the UniServer consortium, which explores energy-efficient micro-servers that leverage intrinsically extended operating margins in their hardware. These deployments expose ALEA to more than 30 academic and industrial research groups across Europe and the US. In particular, ALEA is actively used by researchers in ARM, Maxeler, MACOM (former Applied Micro), and several FinTech companies in the UK and Europe for exploring the energy-efficiency of their software-defined services on commodity and customised servers. ALEA is also adopted by large research groups in two US Universities (Virginia Tech and Old Dominion) to underpin research in energy-aware high-performance computing software. Recently the ALEA energy accounting toolset has been used by ARM to evaluate system power-efficiency and resilience when hardware is operated under extended margins. In earlier impact activity, the ALEA project team at Queen's University Belfast collaborated with researchers of Credit Suisse to explore the potential of using ALEA for improving the energy-efficiency of the bank servers through interventions in their trading and risk analytics software. This exploratory activity has attracted in-kind contribution in terms of personnel time (0.1 FTE for one year) to engage with the Queen's team.
First Year Of Impact 2018
Sector Digital/Communication/Information Technologies (including Software),Energy,Healthcare
Impact Types Societal,Economic

 
Description EPSRC ICT Delivery Planning Workshops
Geographic Reach National 
Policy Influence Type Participation in a national consultation
 
Description Distributed Heterogeneous Vertically IntegrateD ENergy Efficient Data centres
Amount £140,710 (GBP)
Funding ID EP/M015742/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Academic/University
Country United Kingdom
Start 01/2015 
End 12/2017
 
Description EU Horizon2020 Programme: UniServer Project
Amount € 663,625 (EUR)
Funding ID 687628 
Organisation European Commission 
Sector Public
Country European Union (EU)
Start 02/2016 
End 01/2019
 
Description EU Horizon2020 Programme: Vineyard Project
Amount € 4,815,810 (EUR)
Funding ID 688540 
Organisation European Commission 
Sector Public
Country European Union (EU)
Start 02/2016 
End 01/2019
 
Description Heterogeneous parallel and distributed computing with Java
Amount £221,592 (GBP)
Funding ID EP/M015750/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Academic/University
Country United Kingdom
Start 01/2015 
End 12/2017
 
Description Horizon2020 Programme
Amount € 5,999,510 (EUR)
Funding ID H2020-732631 
Organisation European Commission 
Sector Public
Country European Union (EU)
Start 01/2017 
End 12/2020
 
Title ALEA energy accounting toolset 
Description ALEA is a comprehensive toolset for fine-grain, software-defined energy accounting. The toolset is available for standalone applications or virtual machines, and for a variety of hardware targets, including Intel and ARM. 
Type Of Material Improvements to research infrastructure 
Year Produced 2016 
Provided To Others? Yes  
Impact The tool is used by more than 30 academic and industrial groups worldwide to study the energy-efficiency of commodity and experimental hardware under various workloads. 
 
Description Collaboration with ARM and Applied Micro on micro-servers operating with extended voltage/frequency margins 
Organisation ARM Holdings
Country United Kingdom 
Sector Private 
PI Contribution New resource management methods at the operating system & hypervisor levels for leveraging extended operating margins of 64-bit ARM processors to improve system-wide energy-efficiency in micro-servers.
Collaborator Contribution Donation of experimental boards from Applied Micro (XGene) and ARM (Juno).
Impact Ongoing development of variation-aware hypervisors with enhanced resilience, power management and performance management capabilities.
Start Year 2016
 
Description Collaboration with ARM on fine-grain energy accounting tools 
Organisation ARM Holdings
Country United Kingdom 
Sector Private 
PI Contribution Partnership with ARM on developing energy-efficient system software and methods for accurate and fine-grain energy accounting in the Linux operating system. Our research team has contributed new probabilistic models for energy accounting at time scales which are finer than the power sensing or sampling periods of on-chip or off-chip power sensing instruments.
Collaborator Contribution ARM has scoped this collaboration and contributed in an advisory role via a series of online and face-to-face meetings with the ALEA, ENPOWER and GEMSCLAIM project consortia.
Impact Thread-level energy accounting tools for ARM Big.Little platforms (e.g. Exynos) and their use in energy-aware scheduling and resource allocation on mobile devices are currently in implementation and preliminary evaluation stages.
Start Year 2016
 
Description Collaboration with IBM on disaggregated memory technologies and near-data computing 
Organisation IBM
Country United States 
Sector Private 
PI Contribution Our research team has contributed methods to manage data caching and placement on disaggregated memory architectures with near-data processing elements.
Collaborator Contribution IBM has contributed novel remote memory server infrastructures and near-data acceleration technologies.
Impact Materialised through an industrial placement of QUB research staff, this partnership is exploring designs to substantially improve the energy-efficiency of large memory systems, via the use of disaggregation of memory, RDMA-based networking to remote memory devices and near-data accelerators for in-situ, in-memory analytics.
Start Year 2015
 
Description Credit Suisse 
Organisation Credit Suisse Group
Country Switzerland 
Sector Private 
PI Contribution Partnership with Credit Suisse explores design methods and tools to reduce the carbon and space footprint of datacentres that serve real-time financial analytics applications in London.
Collaborator Contribution The partner contributed time equivalent to 0.1FTE over a year to engage in meetings and a joint experimental campaign to evaluate tools developed in the context of the ALEA (EP/L000055/1,) ENPOWER (EP/L004232/1) and GEMSCLAIM (EP/K017594/1) projects.
Impact The collaboration has resulted in dissemination of datacentre energy measurement, energy accounting, and energy optimisation methods explored within the ALEA (EP/L000055/1), ENPOWER (EP/L004232/1) and GEMSCLAIM (EP/L017594/1) projects among stakeholders in the capital markets, as well as a preliminary evaluation of energy-efficient micro-servers based on heterogeneous many-core architectures and the ARM ecosystem, as an alternative to heavily overprovisioned servers in financial datacentres.
Start Year 2014
 
Description DIVIDEND EPSRC/CHIST-ERA project 
Organisation Advanced Micro Devices (AMD)
Country United States 
Sector Private 
PI Contribution ALEA members will be collaborating with AMD and EPFL in the upcoming DIVIDEND project that came about as a result of the ALEA collaboration.
Collaborator Contribution Partners participate in the DIVIDEND project as fully funded entities with significant personnel effort funded from their respective national funding bodies.
Impact Collaboration resulted to the formulation of a successful CHIST-ERA proposal and a work plan to develop vertically integrated solutions for optimising the energy-efficiency of data centres. Research output is expected for the duration of the CHIST-ERA project (01/15-12/17).
Start Year 2014
 
Description DIVIDEND EPSRC/CHIST-ERA project 
Organisation Swiss Federal Institute of Technology in Lausanne (EPFL)
Country Switzerland 
Sector Public 
PI Contribution ALEA members will be collaborating with AMD and EPFL in the upcoming DIVIDEND project that came about as a result of the ALEA collaboration.
Collaborator Contribution Partners participate in the DIVIDEND project as fully funded entities with significant personnel effort funded from their respective national funding bodies.
Impact Collaboration resulted to the formulation of a successful CHIST-ERA proposal and a work plan to develop vertically integrated solutions for optimising the energy-efficiency of data centres. Research output is expected for the duration of the CHIST-ERA project (01/15-12/17).
Start Year 2014
 
Title ALEA Energy Accounting Tool 
Description The ALEA profiler is a cross-platform statistical profiling tool for Linux, which provides time and energy profiling at the basic block level on Intel and ARM architectures (32 and 64 bit). Energy profiling is available for platforms with energy or power meters. Currently, ALEA supports all Intel platforms with enabled RAPL interface and ARM-based Odoroid-XU/Odroid-XU3 platforms. The tool can be used for profiling both sequential and multi-threaded applications. Energy and execution time accounting to source code is also supported for applications compiled with debugging information (DWARF). 
Type Of Technology Software 
Year Produced 2017 
Open Source License? Yes  
Impact ALEA is actively used by a number of groups in the UK and the US (e.g. Edinburgh, Lancaster, Virginia Tech, Old Dominion University) to perform targeted, energy-aware code optimisation on high-end computing systems. 
URL https://hpdc-gitlab.eeecs.qub.ac.uk/lmukhanov/alea-release.git
 
Description NVTV Interview on Superocmputing 
Form Of Engagement Activity A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Media (as a channel to the public)
Results and Impact Interview in NVTV's Behind the Science program on Supercomputing as a technology with impact on our everyday lives.
Year(s) Of Engagement Activity 2015
URL http://www.nvtv.co.uk/shows/behind-the-science-dimitrios-nikolopoulos/