DOME: Delaying and Overcoming Microprocessor Errors

Lead Research Organisation: University of Cambridge
Department Name: Computer Science and Technology

Abstract

Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.

Publications

10 25 50

publication icon
Kanev S (2013) Measuring Code Optimization Impact on Voltage Noise in Workshop on Silicon Errors in Logic - System Effects (SELSE)

publication icon
Mitropoulou K (2016) Lynx

publication icon
Valero A (2016) Enhancing the L1 Data Cache Design to Mitigate HCI in IEEE Computer Architecture Letters

publication icon
Valero A (2017) On Microarchitectural Mechanisms for Cache Wearout Reduction in IEEE Transactions on Very Large Scale Integration (VLSI) Systems

 
Description We have three main findings. First, many permanent errors within processors can be overcome through the addition of a small logic unit capable of re-executing instructions. Second, applications cause different amounts of transistor ageing depending on the operations they perform. Third, errors in a large core can be both detected and corrected using an array of small, power-efficient cores that run in parallel.
Exploitation Route Our work can be used by industry to develop schemes that combat processor ageing and overcome permanent processor faults.
Sectors Digital/Communication/Information Technologies (including Software)

Electronics

URL https://www.cl.cam.ac.uk/~tmj32/
 
Description We have held discussions with Arm about deploying this technology in their R-class processors. These are for real-time systems and require strong reliability guarantees. In the meantime, we have further developed some of the techniques from this work which build a stronger case for including this research.
First Year Of Impact 2016
Sector Digital/Communication/Information Technologies (including Software),Electronics
Impact Types Economic

 
Title Research data supporting "High Performance Fault Tolerance Through Predictive Instruction Re-Execution" 
Description Source code for simulator modules to implement schemes in the paper. 
Type Of Material Database/Collection of data 
Year Produced 2018 
Provided To Others? Yes  
 
Description HiPEAC 
Organisation European Commission
Department Seventh Framework Programme (FP7)
Country European Union (EU) 
Sector Public 
PI Contribution Attending meetings to disseminate results and interact with other researchers in the same area.
Collaborator Contribution A visit by a PostDoc from another member for 4 months.
Impact The network is on High-Performance and Embedded Architectures and Compilers
Start Year 2011
 
Title The Lynx Queue 
Description Lynx is a very fast single-producer, single-consumer software queue. 
Type Of Technology Software 
Year Produced 2016 
Open Source License? Yes  
Impact We have used this queue to develop faster soft-error detection techniques. It has been downloaded 21 times by others. 
URL http://www.cl.cam.ac.uk/~tmj32/data/