Reliable Numerical Computation with Parallel Unreliable Technologies

Lead Research Organisation: Imperial College London

Department Name: Electrical and Electronic Engineering

Abstract

Imagine a world where mathematical dynamical models of processes in medicine, transport, finance and energy, are routinely constructed and refined, on the fly, by personalised hardware devices. With sufficient computational power, such devices could make intelligent decisions, for example utilising advanced control techniques for real-time intervention in diabetic patients and for pacemakers or defibrilators, or to automatically re-route personalised transport options based on up-to-the-minute information. The vision is a powerful one, but highlights many limitations of today's digital technologies, limitations that will not be overcome with simple scaling of technology, but which need a fundamental rethink of the way in which massively parallel computation can be performed. The focus of this proposal is therefore not on any one of the above grand challenges, but rather on the fundamental scientific problems enabling this revolution. The proposed research could enable computation at orders of magnitude better energy/performance ratio than would be possible by extending today's techniques to tomorrow's technologies, opening up a step change in current practices at both the low power end of computation (sophisticated personal devices) and the high power end (advancements in computational physics, chemistry, biology and finance).Over the next decade, there will be a number of radical shifts to the way in which computation is performed. In particular, rather than computing with a single computational core equipped with a numerical computational unit, or with a traditional inter-processor network of such cores, two differences are becoming increasingly apparent. Firstly, if we allow process technology to scale at the rate it could, then each computational device will become increasingly unreliable. Secondly, under the same assumption, this lack of reliability will be compensated by massive parallelism. Independently, each of these issues requires considerable research effort to overcome the lack of reliability and the current inability to make efficient use of massive parallelism in a portable manner, and there is much ongoing international research in these distinct fast-moving areas.These seemingly distinct challenges, however, have an untapped common research core. Massive parallelism in numerical computation mandates a way to effectively deal with numerical imprecision in computation, even in fully reliable circuitry. One must have in mind a specification of tolerable numerical accuracy. Once such a specification exists, and can be formalised, it opens up the potential to use this specification as input to a powerful specialised optimising compiler, capable of optimising numerical hardware and software in order to achieve the specification with maximum performance within a given power envelope. Lack of reliability in future devices will propagate to numerical computation as noticeable numerical errors in the hardware. One way of overcoming these errors is to utilise parallelism in some kind of redundant fashion. A radical alternative is to `hide' the numerical errors caused by unreliable components within the tolerance specification already required. For a fixed silicon area, the emerging multi-core revolution in computational hardware brings to the fore a tension between numerical precision and computational performance; one can no longer afford to pay the price in performance for over-designed hardware.At the core of this research will therefore be the development of mathematical techniques to reason about the compilation of numerical software into parallel hardware, broadening applicability to general classes of nonlinear and control-intensive algorithms, requiring the application of nonlinear systems theory to reason about the convergence of classes of algorithm under numerical perturbation, and mechanising algebraic approaches to reasoning about accuracy in numerical algorithms.

Planned Impact

A full discussion of impact is included in the Pathways to Impact statement. 1. Knowledge - Fundamental scientific advances in analysis and optimization of numerical algorithms for massively parallel unreliable hardware. - Implications for the program analysis and numerical analysis communities in terms of new multi-disciplinary research programmes. - New tools and techniques released via the web. 2. Economy - Contributions to UK competitiveness in the area of parallel hardware architectures and tools. - Contributions to the knowledge-base of UK companies such as ARM and to other investors in the UK. - Potential for patents, licensing, and possibly spin-outs via Imperial Innovations. 3. People - Development of highly skilled researchers in shortage areas. 4. Society - Potential impact in personalised devices, healthcare, transport, modelling and simulation of pandemics, financial markets, and others.

Funded Value:

£1,002,461

Funded Period:

Jul 11 - Dec 16

Funder:

EPSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

EP/I020357/1

Principal Investigator:

George Constantinides

Research Subject:

Info. & commun. Technol. (100%)

Research Topic:

VLSI Design (100%)

Organisations

People	ORCID iD
George Constantinides (Principal Investigator)	http://orcid.org/0000-0002-0201-310X

Publications

Author Name Title Publication Date Published

|< < 1 2 > >|

10 25 50

Le Lann C (2011) Reconfigurable Computing: Architectures, Tools and Applications

Constantinides G (2011) Guest Editors' Introduction: Surveying the Landscape of FPGA Accelerator Research in IEEE Design & Test of Computers

Bayliss S (2011) Reconfigurable Computing: Architectures, Tools and Applications

Constantinides G (2011) Numerical Data Representations for FPGA-Based Scientific Computing in IEEE Design & Test of Computers

Manoukian M (2011) Reconfigurable Computing: Architectures, Tools and Applications

Boland D (2011) Bounding Variable Values and Round-Off Effects Using Handelman Representations in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Levine J (2011) Health monitoring of live circuits in FPGAs based on time delay measurement (abstract only)

Boland D (2011) Optimizing memory bandwidth use and performance for matrix-vector multiplication in iterative methods in ACM Transactions on Reconfigurable Technology and Systems

Longo S (2011) A parallel formulation for predictive control with nonuniform hold constraints in Annual Reviews in Control

Hasan A (2011) Solving a positive definite system of linear equations via the matrix exponential

Liu Q (2012) Optimizing Hardware Design by Composing Utility-Directed Transformations in IEEE Transactions on Computers

Guan Z (2012) A two-stage variation-aware placement method for FPGAS exploiting variation maps classification

Rafique A (2012) Reconfigurable Computing: Architectures, Tools and Applications

Levine J (2012) Online Measurement of Timing in Circuits: For Health Monitoring and Dynamic Voltage & Frequency Scaling

Drane T (2012) Correctly rounded constant integer division via multiply-add

Jerez J (2012) Fixed Point Lanczos: Sustaining TFLOP-equivalent Performance in FPGAs for Scientific Computing

Rafique A (2012) Enhancing performance of Tall-Skinny QR factorization using FPGAs

Bayliss S (2012) Analytical synthesis of bandwidth-efficient SDRAM address generators in Microprocessors and Microsystems

Tan X (2012) Reconfigurable Computing: Architectures, Tools and Applications

Bayliss S (2012) Optimizing SDRAM bandwidth for custom FPGA loop accelerators

Shahzad A (2012) A Stable and Efficient Method for Solving a Convex Quadratic Program with Application to Optimal Control in SIAM Journal on Optimization

Boland D (2012) A scalable approach for automated precision analysis

Kan Shi (2013) Accuracy-Performance Tradeoffs on an FPGA through Overclocking

Rafique A (2013) Application Composition and Communication Optimization in Iterative Solvers Using FPGAs

Hasan A (2013) Control-Theoretic Forward Error Analysis of Iterative Numerical Algorithms in IEEE Transactions on Automatic Control

Kan Shi (2013) Overclocking datapath for latency-error tradeoff

Boland D (2013) A Scalable Precision Analysis Framework in IEEE Transactions on Multimedia

Shi K (2014) Efficient FPGA implementation of digit parallel online arithmetic operators

Guan Z (2014) Mitigation of process variation effect in FPGAs with partial rerouting method in IEICE Electronics Express

Guan Z (2014) Classification on variation maps: a new placement strategy to alleviate process variation on FPGA in IEICE Electronics Express

Gao X (2015) Numerical Program Optimization for High-Level Synthesis

Winterstein F (2015) MATCHUP

Winterstein F (2015) Separation Logic for High-Level Synthesis in ACM Transactions on Reconfigurable Technology and Systems

Wickerson J (2015) Remote-scope promotion: clarified, rectified, and verified

Jerez J (2015) A Low Complexity Scaling Method for the Lanczos Kernel in Fixed-Point Arithmetic in IEEE Transactions on Computers

Liu J (2015) Offline Synthesis of Online Dependence Testing: Parametric Loop Pipelining for HLS

Hung E (2015) Delay-Bounded Routing for Shadow Registers

Shi K (2015) Imprecise Datapath Design An Overclocking Approach in ACM Transactions on Reconfigurable Technology and Systems

Kerrigan E (2015) Computer architectures to close the loop in real-time optimization

Rafique A (2015) Communication Optimization of Iterative Sparse Matrix-Vector Multiply on GPUs and FPGAs in IEEE Transactions on Parallel and Distributed Systems

Liu J (2016) Loop Splitting for Efficient Pipelining in High-Level Synthesis

Ramanathan N (2016) A Case for Work-stealing on FPGAs with OpenCL Atomics

Batty M (2016) Overhauling SC atomics in C11 and OpenCL in ACM SIGPLAN Notices

Davis J (2016) Knowledge is Power

Gao X (2016) Automatically Optimizing the Latency, Area, and Accuracy of C Programs for High-Level Synthesis

Batty M (2016) Overhauling SC atomics in C11 and OpenCL

Hung E (2016) KAPow: A System Identification Approach to Online Per-Module Power Estimation in FPGA Designs

Khusainov B (2016) Multi-objective Co-design for model predictive control with an FPGA

Constantinides G (2017) Algorithms and Arithmetic: Choose Wisely

Wickerson J (2017) Automatically comparing memory consistency models in ACM SIGPLAN Notices

Key Findings
Impact Summary
Further Funding
Collaboration


Description	- That iterative algorithms are amenable to automated finite precision analysis - That memory systems both containing regular and irregular accesses can be customised automatically to the underlying application - That overclocking error can be bounded an treated in a similar fashion to roundoff error - That novel data representations can improve the ability to overclock datapath
Exploitation Route	By applying the lessons in computer architecture design By developing tools to fully automate the approaches developed
Sectors	Digital/Communication/Information Technologies (including Software),Electronics


Description	Research has been used by a number of companies. Imagination Technologies funded funded an RAEng Research Chair to take this forward. Intel is currently funding research at Imperial College continuing these theme of work.
Sector	Creative Economy,Digital/Communication/Information Technologies (including Software),Electronics,Leisure Activities, including Sports, Recreation and Tourism
Impact Types	Societal,Economic


Description	FP7 STREP - DESYRE
Amount	£170,209 (GBP)
Funding ID	DESYRE
Organisation	European Commission
Sector	Public
Country	European Union (EU)
Start	10/2011
End	01/2015


Description	RAEng Research Chair
Amount	£215,536 (GBP)
Organisation	Royal Academy of Engineering
Sector	Charity/Non Profit
Country	United Kingdom
Start	01/2015
End	12/2019


Description	RAEng Research Chair (Industry)
Amount	£342,958 (GBP)
Organisation	Imagination Technologies
Sector	Private
Country	United Kingdom
Start	01/2015
End	12/2019


Description	ARM - ChallEng
Organisation	Arm Limited
Country	United Kingdom
Sector	Private
PI Contribution	None to date
Collaborator Contribution	None to date
Impact	None to date
Start Year	2011


Description	Altera - ChallEng
Organisation	Altera
Department	Altera Europe
Country	United Kingdom
Sector	Private
PI Contribution	Research output
Collaborator Contribution	Design tools, licences, engineer time
Impact	Publications
Start Year	2011


Description	Microsoft - ChallEng
Organisation	Microsoft Research
Department	Microsoft Research Cambridge
Country	United Kingdom
Sector	Private
PI Contribution	Research output
Collaborator Contribution	Terminator
Impact	Publications
Start Year	2011


Description	QMUL - ChallEng
Organisation	Queen Mary University of London
Country	United Kingdom
Sector	Academic/University
PI Contribution	Research output
Collaborator Contribution	Workshops
Impact	QMUL group moved
Start Year	2011


Description	University of Toronto - Chall Eng
Organisation	University of Toronto
Country	Canada
Sector	Academic/University
PI Contribution	Staff exchange
Collaborator Contribution	Staff exchange
Impact	Staff exchange
Start Year	2011

Abstract

Planned Impact

Organisations

People

ORCID iD

Publications