DOME: Delaying and Overcoming Microprocessor Errors

Lead Research Organisation: University of Cambridge

Department Name: Computer Science and Technology

Abstract

Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.

Funded Value:

£713,649

Funded Period:

Sep 12 - Sep 16

Funder:

EPSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

EP/J016284/1

Principal Investigator:

Timothy Jones

Alan Mycroft

Research Subject:

Info. & commun. Technol. (100%)

Research Topic:

Computer Sys. & Architecture (100%)

Organisations

People	ORCID iD
Timothy Jones (Principal Investigator)	http://orcid.org/0000-0002-4114-7661
Alan Mycroft (Principal Investigator)

Publications

Author Name

Title Publication Date Published

|< < 1 2 > >|

10 25 50

Ainsworth S (2018) Parallel Error Detection Using Heterogeneous Cores

Kanev S (2013) Measuring Code Optimization Impact on Voltage Noise in Workshop on Silicon Errors in Logic - System Effects (SELSE)

Mitropoulou K (2016) Lynx

Mitropoulou K. (2016) COMET: Communication-optimised multi-threaded error-detection technique in Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems, CASES 2016

Porpodas V (2015) Throttling Automatic Vectorization: When Less is More

Porpodas V (2015) PSLP: Padded SLP automatic vectorization

Soman J (2017) High performance fault tolerance through predictive instruction re-execution

Soman J (2015) REPAIR: Hard-error recovery via re-execution

Valero A (2017) On microarchitectural mechanisms for cache wearout reduction

Valero A (2017) On Microarchitectural Mechanisms for Cache Wearout Reduction in IEEE Transactions on Very Large Scale Integration (VLSI) Systems

Key Findings
Impact Summary
Research Databases and Models
Collaboration
Software and Technical Products


Description	We have three main findings. First, many permanent errors within processors can be overcome through the addition of a small logic unit capable of re-executing instructions. Second, applications cause different amounts of transistor ageing depending on the operations they perform. Third, errors in a large core can be both detected and corrected using an array of small, power-efficient cores that run in parallel.
Exploitation Route	Our work can be used by industry to develop schemes that combat processor ageing and overcome permanent processor faults.
Sectors	Digital/Communication/Information Technologies (including Software) Electronics
URL	https://www.cl.cam.ac.uk/~tmj32/


Description	We have held discussions with Arm about deploying this technology in their R-class processors. These are for real-time systems and require strong reliability guarantees. In the meantime, we have further developed some of the techniques from this work which build a stronger case for including this research.
First Year Of Impact	2016
Sector	Digital/Communication/Information Technologies (including Software),Electronics
Impact Types	Economic


Title	Research data supporting "High Performance Fault Tolerance Through Predictive Instruction Re-Execution"
Description	Source code for simulator modules to implement schemes in the paper.
Type Of Material	Database/Collection of data
Year Produced	2018
Provided To Others?	Yes


Description	HiPEAC
Organisation	European Commission
Department	Seventh Framework Programme (FP7)
Country	European Union (EU)
Sector	Public
PI Contribution	Attending meetings to disseminate results and interact with other researchers in the same area.
Collaborator Contribution	A visit by a PostDoc from another member for 4 months.
Impact	The network is on High-Performance and Embedded Architectures and Compilers
Start Year	2011


Title	The Lynx Queue
Description	Lynx is a very fast single-producer, single-consumer software queue.
Type Of Technology	Software
Year Produced	2016
Open Source License?	Yes
Impact	We have used this queue to develop faster soft-error detection techniques. It has been downloaded 21 times by others.
URL	http://www.cl.cam.ac.uk/~tmj32/data/

Abstract

Organisations

People

ORCID iD

Publications