DOME: Delaying and Overcoming Microprocessor Errors
Lead Research Organisation:
University of Cambridge
Department Name: Computer Science and Technology
Abstract
Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.
Publications
Ainsworth S
(2018)
Parallel Error Detection Using Heterogeneous Cores
Kanev S
(2013)
Measuring Code Optimization Impact on Voltage Noise
in Workshop on Silicon Errors in Logic - System Effects (SELSE)
Mitropoulou K
(2016)
Lynx
Porpodas V
(2015)
PSLP: Padded SLP automatic vectorization
Porpodas V
(2015)
Throttling Automatic Vectorization: When Less is More
Soman J
(2015)
REPAIR: Hard-error recovery via re-execution
Valero A
(2017)
On microarchitectural mechanisms for cache wearout reduction
Valero A
(2016)
Enhancing the L1 Data Cache Design to Mitigate HCI
in IEEE Computer Architecture Letters
Valero A
(2017)
On Microarchitectural Mechanisms for Cache Wearout Reduction
in IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Description | We have three main findings. First, many permanent errors within processors can be overcome through the addition of a small logic unit capable of re-executing instructions. Second, applications cause different amounts of transistor ageing depending on the operations they perform. Third, errors in a large core can be both detected and corrected using an array of small, power-efficient cores that run in parallel. |
Exploitation Route | Our work can be used by industry to develop schemes that combat processor ageing and overcome permanent processor faults. |
Sectors | Digital/Communication/Information Technologies (including Software) Electronics |
URL | https://www.cl.cam.ac.uk/~tmj32/ |
Description | We have held discussions with Arm about deploying this technology in their R-class processors. These are for real-time systems and require strong reliability guarantees. In the meantime, we have further developed some of the techniques from this work which build a stronger case for including this research. |
First Year Of Impact | 2016 |
Sector | Digital/Communication/Information Technologies (including Software),Electronics |
Impact Types | Economic |
Title | Research data supporting "High Performance Fault Tolerance Through Predictive Instruction Re-Execution" |
Description | Source code for simulator modules to implement schemes in the paper. |
Type Of Material | Database/Collection of data |
Year Produced | 2018 |
Provided To Others? | Yes |
Description | HiPEAC |
Organisation | European Commission |
Department | Seventh Framework Programme (FP7) |
Country | European Union (EU) |
Sector | Public |
PI Contribution | Attending meetings to disseminate results and interact with other researchers in the same area. |
Collaborator Contribution | A visit by a PostDoc from another member for 4 months. |
Impact | The network is on High-Performance and Embedded Architectures and Compilers |
Start Year | 2011 |
Title | The Lynx Queue |
Description | Lynx is a very fast single-producer, single-consumer software queue. |
Type Of Technology | Software |
Year Produced | 2016 |
Open Source License? | Yes |
Impact | We have used this queue to develop faster soft-error detection techniques. It has been downloaded 21 times by others. |
URL | http://www.cl.cam.ac.uk/~tmj32/data/ |