Automated Synthesis of High Performance Low Power Embedded Systems
Lead Research Organisation:
University of Edinburgh
Department Name: Sch of Informatics
Abstract
Embedded computers are used in a wide range of applications from automotive control to portable media and communications devices. These are increasingly called upon to perform computationally intensive tasks which a few years ago would be considered supercomputing . Many embedded systems are portable or battery-powered and must therefore rely on a limited energy source. Designers of embedded systems are hence called upon to optimise the hardware and software so that it operates at high speed whilst consuming very little power.The number of transistors on each square millimetre of silicon doubles every 18-24 months, leading to an exponential growth in the resources available to the designers of embedded systems. However, at the same time, the relative signal transmission delays through on-chip wiring are growing alarmingly. This now means that designers cannot predict on-chip delays without actually creating the design and measuring the wire delays. Hence, the true performance and power consumption are not known until late in the design process, at which point it is too late to re-visit the design and perform high-level optimisations. In effect, the process of laying out the design of a system on silicon alters the characteristics of the system in ways that current tools are unable to predict. Manual optimisation then becomes slow, costly, and necessarily incomplete.In this project we seek to automate the design and optimisation of embedded systems. We shall do this by creating tools that are able to learn about the physical characteristics of the underlying silicon technology and use that knowledge to synthesise the structure of an embedded processor. As the processor is now a flexible entity without a pre-defined instruction set, the compiler for that processor must also be synthesised. Furthermore, the code optimisations that the compiler performs when translating from source code to the synthetic architecture, must also be synthesised.The internal structure of a processor (known as the micro-architecture) is usually hidden from the compiler. For example, the micro-architecture could implement fundamental arithmetic operations such as multiplication and division in any one of several different ways, each of which yields the same result. There are many examples like this, each one of which presents the designer with trade-offs between performance, silicon die area and energy consumption. Our research proposal also addresses these micro- architectural design alternatives. Our research will investigate ways of automating these micro-architectural trade-offs to optimise for low power consumption, high performance, and reduced die area.We therefore have three areas in which automated synthesis-must be performed: compilers; architectures; micro-architectures. However, the information on which to make automated decisions in each case will be different. At the micro-architecture level we need to know how each microarchitecture option translates into speed, energy and die area. At the architectural level we need to know how each instruction set option translates into clock cycles of execution time, and at the compiler level we need to know how each optimisation reduces the overall number of instructions executed and maximises the effectiveness of the memory system. The challenge of this research proposal is that all three areas are inter-dependent and ultimately depend on the characteristics of the silicon on which the system is based. By blurring the boundary between hardware and software, and by automating the process of adjusting that boundary, we hope to create a system that can perform design tradeoffs in seconds when currently it takes an experienced designer several days.
Organisations
Publications
Rassan, R
(2007)
A Hybrid Markov Model for Accurate Memory Reference Generation
Hassan, P
(2007)
Synthetic Trace-Driven Simulation of Cache Memory
Franke, B
(2008)
Fast cycle-approximate instruction set simulation
Murray, A
(2008)
Fast source-level data assignment to dual memory banks
Murray A
(2009)
Code transformation and instruction set extension
in ACM Transactions on Embedded Computing Systems
Tournavitis, G
(2009)
Towards Automatic Profile-Driven Parallelization of Embedded Multimedia Applications
Zuluaga M
(2009)
Design-Space Exploration of Resource-Sharing Solutions for Custom Instruction Set Extensions
in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Description | Embedded computers are used in a wide range of applications from automotive control to portable media and communications devices. These are increasingly called upon to perform computationally intensive tasks which a few years ago would be considered supercomputing. Many embedded systems are portable or battery-powered and must therefore rely on a limited energy source. Designers of embedded systems are hence called upon to optimise the hardware and software so that it operates at high speed whilst consuming very little power. In this project we sought to automate the design and optimisation of embedded systems. This involved research in a number of complementary areas including: new ways of extending the design of a microprocessor to add application-specific instructions; high-speed simulation and modelling techniques to enable performance estimation for automated design decision-making; new compilation techniques to map software onto our new configurable and extendable processors; and the application of machine-learning based techniques in both design-space exploration and high-speed modelling and simulation. This project developed new algorithms for synthesizing the hardware of an extended processor in two ways; first, by selecting and mapping new application-specific instructions to a Configurable Flow Accelerator; and secondly by exposing the vast design space created during the critical process of hardware resource sharing. The project also developed novel learning-based approaches to navigating through the design space of possible hardware solutions. This learning-based approach allowed the hardware design tools to build a statistical model of how best to merge the logical implementations of the many fragments of new hardware created when new instructions are added to a processor. By learning about the design space through the creation of a statistical model, the search for the best solution was shown to be reduced by a factor of 200. This reduces the potential design cost from several weeks to just a few minutes of compute time. One of the stated goals of the project was to create a silicon implementation of a microprocessor embodying the new ideas developed during the project. To that end we developed a low-power embedded microprocessor called EnCore, and produced an initial silicon test chip in November 2008. This used the UMC 130nm high-speed CMOS process, which we accessed via Europractice. The first chip, codenamed Calton, was fully functional, occupied less than 1 sq.mm of silicon, ran at up to 375 MHz and consumed 97 uW/MHz of power. We followed that with a second device a few months later, to test a revised hardware design flow. This used the same 130nm silicon process, and was again fully functional. Towards the end of the project we designed and fabricated a 90nm chip, codenamed Castle, which contained a more powerful EnCore processor combined with a synthetic Configurable Flow Accelerator. This device was again fully functional, ran at up to 600 MHz, occupied around 1.7 sq.mm of silicon, and consumed 125 uW/MHz of power. The Configurable Flow Accelerator contained a configurable data-path with 4 multipliers, 4 shifters and 4 adders, all of which could be combined into a single powerful instruction. We researched a range of compilation techniques for mapping application code to this configurable processor, and were able to run code generated by our own compiler on the EnCore processor. This research project also innovated significantly in the use of statistical machine learning to create accurate performance models for embedded microcessors. This was coupled with the development of a very high speed full-system simulator, based on parallel just-in-time (JIT) dynamic binary translation. This allows designers to predict the performance of software running on hardware that may not yet exist, and even to try various alternative hardware configurations before committing to the design. |
Exploitation Route | High-speed microprocessor simulation technologies, and low-power microprocessor designs, resulting from this research, were licensed to industry and are now being used. |
Sectors | Digital/Communication/Information Technologies (including Software) Electronics |
URL | http://groups.inf.ed.ac.uk/pasta/ |
Description | EnCore is a low-power configurable microprocessor design for use in embedded systems. It was developed in the School of Informatics at Edinburgh University, within the EPSRC-funded PASTA project, and subsequently licensed to a major Silicon Valley semiconductor IP, and electronic design automation (EDA) company. This design is now marketed and has numerous licensees who are incorporating derivatives of the EnCore technology into their chips. Beneficiaries: Companies worldwide who design silicon chips for embedded applications, and who need a small low-power but high-performance embedded processor cores. |
First Year Of Impact | 2012 |
Sector | Digital/Communication/Information Technologies (including Software),Electronics,Healthcare |
Impact Types | Societal Economic |
Description | Associated Compiler Experts |
Amount | £417,391 (GBP) |
Funding ID | EP/I013539/1 |
Organisation | Engineering and Physical Sciences Research Council (EPSRC) |
Sector | Public |
Country | United Kingdom |
Start |
Description | Associated Compiler Experts |
Amount | £417,391 (GBP) |
Funding ID | EP/I013539/1 |
Organisation | Engineering and Physical Sciences Research Council (EPSRC) |
Sector | Public |
Country | United Kingdom |
Start | 11/2010 |
End | 07/2014 |
Description | EPSRC |
Amount | £1,217,557 (GBP) |
Funding ID | EP/I013539/1 |
Organisation | Engineering and Physical Sciences Research Council (EPSRC) |
Sector | Public |
Country | United Kingdom |
Start | 11/2010 |
End | 07/2014 |
Description | EPSRC |
Amount | £1,217,557 (GBP) |
Funding ID | EP/I013539/1 |
Organisation | Engineering and Physical Sciences Research Council (EPSRC) |
Sector | Public |
Country | United Kingdom |
Start | 11/2010 |
End | 07/2014 |
Description | Virage Logic |
Amount | £272,777 (GBP) |
Funding ID | EP/I013539/1 |
Organisation | Engineering and Physical Sciences Research Council (EPSRC) |
Sector | Public |
Country | United Kingdom |
Start |
Description | Virage Logic |
Amount | £272,777 (GBP) |
Funding ID | EP/I013539/1 |
Organisation | Engineering and Physical Sciences Research Council (EPSRC) |
Sector | Public |
Country | United Kingdom |
Start | 11/2010 |
End | 07/2014 |