FaT HaMM: Fault Tolerant Hardware from Malleable Microarchitectures

Lead Research Organisation: Imperial College London
Department Name: Electrical and Electronic Engineering

Abstract

Theme: Manufacturing the Future
Research Area: Artifical Inteligence Technologies

Modern fault tolerance in electronics is built on redundancy. Foundational research in evolvable hardware has highlighted the capacity for fault tolerant systems built on flexibility; should a critical fault occur, these systems use Field Programmable Gate Arrays (FPGAs) and genetic algorithms to rewrite hardware configurations. This proposal outlines a scheme of work to develop fault resistant processors by deploying automated hardware recalibration. Two approaches will be used to mitigate the historically poor scaling of evolvable hardware. Firstly, full reconfigurability will only be used on a small scale; by embedding small regions of fully flexible hardware within a conventional inflexible processor. By replacing a number of architectural components within a processor, with counterparts built on this "malleable" microarchitecture, I aim to build the fault recovery capabilities of evolvable hardware systems into fault prone architectural components. Secondly, the flexible region will be segmented into separate zones, each an area of silicon, so that the fault can be pinpointed and only the affected region reconfigured. These steps should reduce the search space to a manageable size.

The malleable microarchitectural components will consist of a mesh of FPGA-like configurable areas of hardware, and the capacity to pinpoint erroneous areas and reconfigure them, if a fault is detected. This practical flexibility will be developed in two parts; by devising tuned machine learning algorithms to compete with, or improve on, genetic algorithms, and by designing a new processor architecture using malleable microarchitectural components.
Specifically, this programme of research aims to:
1. Exploit modern machine learning methods to develop new automated approaches to hardware design; these approaches should be tailored to system scaling and be capable of tackling non-trivial hardware designs.
2. Improve known fault models for modern processors through industrial partnerships.
3. Embed FPGA technology within conventional processor designs to create a malleable hardware processor to act as a new automated hardware design substrate, with configurable degrees of flexibility.
4. Create an exhaustive testing framework and explore the fault tolerance and scaling properties of the designed malleable hardware system.

An initial direction for the machine learning research would explore the potential of deep learning as a mechanism of hardware logic design. Namely looking into the possibility of converting deep-learned aggressively pruned neural nets into networks of logic gates.
As mentioned, one of the central problems in automated hardware design is system scaling. To tackle this; rather than try to automate the design of an entire processor, the flexible portions of the chip will be self-contained and of a manageable size; and the machine learning algorithms will be designed with scalability at the forefront.

By the conclusion of this research, it is hoped that a practical fault recovery mechanism can be constructed, rivalling the current cutting edge redundancy-based methods. These processors will have the capacity to rewrite problematic fault-prone hardware to recover from performance-critical faults. Theoretical malleable processors could ship with a configuration preloaded, and out of the box they will perform identically to a conventional chip. However, upon the detection of a faulty component, a search procedure will begin to look for alternate hardware designs which reduce the impact of the fault, or negate it completely. This will result in a processor capable of a limited amount of self-healing.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/N509486/1 01/10/2016 31/03/2022
2283663 Studentship EP/N509486/1 01/10/2019 31/05/2023 Alexander Dalton
EP/R513052/1 01/10/2018 30/09/2023
2283663 Studentship EP/R513052/1 01/10/2019 31/05/2023 Alexander Dalton