DOME: Delaying and Overcoming Microprocessor Errors

Lead Research Organisation: University of Manchester

Department Name: Computer Science

Abstract

Modern day computer systems have benefited from being designed and manufactured using an ever-increasing budget of transistors with very reliable integrated circuits. However, moving forward such a ''free lunch'' is over and forgotten nightmares faced by computer pioneers are coming back to haunt us. Not so long ago, unreliable valves were the basic building blocks for computers and research focussed on how to successfully compute, overcoming this underlying weakness (e.g. von Neuman, 1956, ''Probabilistic logics and the synthesis of reliable organisms from unreliable components'').

State-of-the-art integrated circuit technologies have now reached the range of 40-22 nanometers, posing significant reliability challenges. Hard or permanent errors can manifest themselves at any point during a processor's lifetime. During manufacturing, errors can render a proportion of a chip incapable of computing, thus decreasing yield and profit.
As we move towards smaller and smaller components, transistors take less and less time to wearout, becoming more prone to failure in the field. Traditional reliability solutions involve applying high-cost redundancy to the hardware structures within the processor, providing backup spares for when errors occur. On the application side, solutions also involve redundancy by running multiple copies of each piece of software.

A common criticism of current reliability solutions is that they do not consider how the software and hardware can be co-designed synergistically to tackle this challenge. Redesigning and reimplementing general purpose software applications will incur an unaffordable price tag. Our hypothesis is that virtualization technologies (a layer that transparently hides the underlying platform from the application software) have an important role to play. In particular, managed runtime environments (MREs) have become pervasive for high-productivity software developers and represent a promising vehicle for providing reliability mechanisms. Within these systems, applications can be monitored and morphed without user intervention.

There are two complementary strands to our proposed research, focused around a co-designed MRE and multicore computer architecture. Firstly, we will consider wearout mitigation schemes to slow processor ageing and lengthen a chip's lifetime before a hard fault occurs. Secondly, given that an error will occur at some point during a system's life, we will develop error-tolerance approaches that maintain execution on faulty hardware.

If successful, we believe this project will be seen as a significant milestone in the development of wearout-conscious and error-tolerant multicore architectures over the next decade. This research programme will advance our understanding of the field, tackling the UK Microelectronics Grand Challenge of Moore for Less that has been signposted by EPSRC. It is also important to highlight that this proposal tackles a key aspect of the new EPSRC ICT capability priority on "Many-core architectures and concurrency in distributed and embedded systems".

Planned Impact

State-of-the-art integrated circuit technologies have now reached the range of 45-22 nanometres, posing significant reliability challenges. Moving forward, multicore and many-core chips reliability is as a key design challenge for systems designers. DOME, our proposed research program, provides a transparent path forward for software applications. Managed runtime environments (MREs) have become pervasive for high-productivity software developers and represent a promising virtualisation vehicle for providing reliability mechanisms. Within these systems, applications can be monitored and morphed without application developer intervention.

Our objectives are focused around a co-designed MRE and computer architecture for multicore chips. We will consider wearout mitigation schemes to slow processor ageing and lengthen a chip's lifetime before a hard fault occurs. Given that an error will occur at some point during a system's life, we will develop error-tolerance approaches that maintain execution on faulty hardware.

If successful, we believe this project will be seen as a significant milestone in the development of wearout-conscious and error-tolerant multicore architectures over the next decade. This research programme will advance our understanding of the field, tackling the UK Microelectronics Grand Challenge of Moore for Less that has been signposted by EPSRC.

The main beneficiaries of this work can be broadly categorised into two groups. Firstly, academics stand to benefit through the development of our ideas into working prototypes. We are aiming to restructure software on-the-fly to adapt to changes in the underlying system's reliability. This is something that has never been done before and, if successful, will open up a new research area of dynamic optimisation for reliability.

The second group of beneficiaries is industry. We have secured strong letters of support from major companies, such as UK-based ARM, Microsoft and Qualcomm. They testify that our proposed research is both necessary to address the upcoming challenges that they face, and it also outlines novel solutions to the problems. Interactions with UK industry will allow our ideas to be fed into their research and development efforts in this area.

Furthermore, alongside the activities described above, at an early stage in the project we intend to identify the core IP generated from the research. With this in hand, we will evaluate the potential for further economic gain. Both universities have an excellent track record in commercial exploitation, Silistix and Transitive being two such examples.

As with any long-term research, it is complex to express its impact on the wider research community and society. However, EPSRC's vision for a digital economy is underpinned by the need for reliable electronics and systems. This proposal will significantly contribute towards this goal. Similarly, DOME will provide contribute towards the new EPSRC ICT capability theme on "many-core architectures and concurrency in distributed and embedded systems".

Funded Value:

£589,624

Funded Period:

Sep 12 - Sep 16

Funder:

EPSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

EP/J016330/1

Principal Investigator:

Mikel Lujan

Research Subject:

Info. & commun. Technol. (100%)

Research Topic:

Computer Sys. & Architecture (100%)

Organisations

People	ORCID iD
Mikel Lujan (Principal Investigator)
Stephen Furber (Co-Investigator)
Behram Khan (Researcher)
Luis Plana Cabrera (Researcher)
Daniel Goodman (Researcher)

Publications

Author Name

Title Publication Date Published

|< < 1 2 > >|

10 25 50

Allen A (2022) Developing a well-received pre-matriculation program: the evolution of MedFIT. in Discover education

Barrett C (2016) Towards co-designed optimizations in parallel frameworks

Cakmaki Y (2016) Cyclic Power-Gating as an Alternative to Voltage and Frequency Scaling in IEEE Computer Architecture Letters

D'Antras A (2017) Low overhead dynamic binary translation on ARM

D'Antras A (2016) Optimizing Indirect Branches in Dynamic Binary Translators in ACM Transactions on Architecture and Code Optimization

D'Antras A (2017) HyperMAMBO-X64

Gorgovan C (2016) MAMBO A Low-Overhead Dynamic Binary Modification Tool for ARM in ACM Transactions on Architecture and Code Optimization

Kotselidis C (2017) Heterogeneous Managed Runtime Systems

Lu K (2017) Flexible Page-level Memory Access Monitoring Based on Virtualization Hardware

Rodchenko A (2015) Euro-Par 2015: Parallel Processing - 21st International Conference on Parallel and Distributed Computing, Vienna, Austria, August 24-28, 2015, Proceedings

Key Findings
Impact Summary
Research Tools and Methods
Collaboration
Spin Outs


Description	ARMOR is a hardware-solution to prevent Row Hammer Errors in DRAMs, designed and developed in the School of Computer Science at The University of Manchester. The main challenge to mitigate the Row-Hammer effect is to monitor the number of activations for each row in the DRAM, which imposes a significant storage overhead to the memory system. ARMOR monitors the activation stream at the memory interface level and detects which specific rows (i.e. hot rows) are at risk of being "hammered" at run-time. ARMOR is capable of detecting all the possible hot-rows in a system with a minimal storage overhead (e.g. 800 Bytes to protect 4 GB of DRAM). Why ARMOR is a Promising Solution? It is capable of detecting all the possible Row Hammer errors with a high level of confidence. It provides precise information about the hammered rows (addresses) and the number of activations with a high level of accuracy. - It does not need to know about the logical to physical mapping of DRAMs in order to mitigate Row Hammer error (ARMOR Cache Solution). - It is scalable according to the size of memory. - It is technology independent and can easily support future device technologies.
Exploitation Route	UMIP (http://umip.com/) has provided funding to pay for the ARMOR patent, commercial viability studies and development of prototype with FPGAs/design boards.
Sectors	Digital/Communication/Information Technologies (including Software)
URL	http://apt.cs.manchester.ac.uk/projects/ARMOR/RowHammer/armor.html


Description	We have open-sourced a dynamic binary modification tool for the ARM architecture (see http://github.com/beehive-lab/mambo). As the ARM architecture expands beyond its traditional embedded domain, there is a growing interest in dynamic binary modification (DBM) tools for general-purpose multicore processors that are part of the ARM family. Existing DBM tools for ARM suffer from introducing large overheads in the execution of applications. The specific questions that this article addresses are (i) how to develop such DBM tools for the ARM architecture and (ii) whether new optimisations are plausible and needed. We describe the general design of MAMBO, a new DBM tool for ARM, which we release together with this publication [1], and introduce novel optimisations to handle indirect branches. In addition, we explore scenarios in which it may be possible to relax the transparency offered by DBM tools to allow extra optimisations to be applied. These scenarios arise from analysing the most typical usages: for example, application binaries without handcrafted assembly. The performance evaluation shows that MAMBO introduces small overheads for SPEC CPU2006 and PARSEC 3.0 when comparing with the execution times of the unmodified programs. [1] MAMBO: A Low-Overhead Dynamic Binary Modification Tool for ARM C Gorgovan, A d'Antras, M Luján ACM Transactions on Architecture and Code Optimization 13 (1), 14
Sector	Digital/Communication/Information Technologies (including Software)
Impact Types	Economic


Title	MAMBO - A low-overhead dynamic binary instrumentation and modification tool
Description	As the ARM architecture expands beyond its traditional embedded domain, there is a growing interest in dynamic binary modification (DBM) tools for general-purpose multicore processors that are part of the ARM family. Existing DBM tools for ARM suffer from introducing large overheads in the execution of applications. The specific questions that this article addresses are (i) how to develop such DBM tools for the ARM architecture and (ii) whether new optimisations are plausible and needed. We describe the general design of MAMBO, a new DBM tool for ARM, which we release together with this publication, and introduce novel optimisations to handle indirect branches. In addition, we explore scenarios in which it may be possible to relax the transparency offered by DBM tools to allow extra optimisations to be applied. These scenarios arise from analysing the most typical usages: for example, application binaries without handcrafted assembly. The performance evaluation shows that MAMBO introduces the smallest published overheads for the ARM architecture.
Type Of Material	Improvements to research infrastructure
Year Produced	2016
Provided To Others?	Yes
Impact	The MAMBO tool is the fastest and most complete Dynamic Binary Modification and Instrumentation tool for Arm architectures. Researchers are using MAMBO to build architectural simulators as well as security tools to analysis Arm binaries.
URL	https://github.com/beehive-lab/mambo


Title	Simulation tool MaxSim
Description	Managed applications, written in programming languages such as Java, C# and others, represent a significant share of workloads in the mobile, desktop, and server domains. Microarchitectural timing simulation of such workloads is useful for characterization and performance analysis, of both hardware and software, as well as for research and development of novel hardware extensions.This paper introduces MaxSim, a simulation platform based on the Maxine VM, the ZSim simulator, and the McPAT modeling framework. MaxSim is able to simulate fast and accurately managed workloads running on top of Maxine VM and its capabilities are showcased with novel simulation techniques for:1) low-intrusive microarchitectural profiling via pointer tagging on the x86-64 platforms, 2) modeling of hardware extensions related, but not limited to, tagged pointers, and 3) modeling of complex software changes via address-space morphing.Low-intrusive microarchitectural profiling is achieved by utilizing tagged pointers to collect type- and allocation-site- related hardware events. Furthermore, MaxSim allows, through a novel technique called address space morphing, the easy modeling of complex object layout transformations. Finally, through the co-designed capabilities of MaxSim, novel hardware extensions canbe implemented and evaluated.We showcase MaxSim's capabilities by simulating the wholeset of the DaCapo-9.12-bach benchmarks in less than a day while performing an up-to-date microarchitectural power and performance characterization. Furthermore, we demonstrate a hardware/software co-designed optimization that performs dynamic load elimination for array length retrieval achieving up to14% L1 data cache loads reduction and up to 4% dynamic energy reduction.
Type Of Material	Improvements to research infrastructure
Year Produced	2017
Provided To Others?	Yes
Impact	Best paper award in the conference dedicated to simulation platforms
URL	https://github.com/beehive-lab/MaxSim


Description	Flexible Page-level Memory Access Monitoring Based on Virtualization Hardware
Organisation	National University of Defense Technology
Country	China
Sector	Academic/University
PI Contribution	Page protection is often used to achieve memory access monitoring in many applications, dealing with programanalysis, checkpoint-based failure recovery, and garbage collection in managed runtime systems. Typically, low overhead access monitoring is limited by the relatively large page-level granularity of memory management unit hardware support for virtual memory protection. In this paper, we improve upon traditional page-level mechanisms by additionally using hardware support for virtualization in order to achieve fine and flexible granularities that can be smaller than a page. We first introduce a memory allocator based on page protection that can achieve fine-grained monitoring. Second, we explain how virtualization hardware support can be used to achieve dynamic adjustment of the monitoring granularity. In all, we propose a process-level virtual machine to achieve dynamic and fine-grained monitoring. Any application can run on our process-level virtual machine without modification. Experimental results for an incremental checkpoint tool provide a use-case to demonstrate our work. Comparing with traditional page-based checkpoint, our work can effectively reduce the amount of checkpoint data and improve performance.
Collaborator Contribution	Page protection is often used to achieve memory access monitoring in many applications, dealing with programanalysis, checkpoint-based failure recovery, and garbage collection in managed runtime systems. Typically, low overhead access monitoring is limited by the relatively large page-level granularity of memory management unit hardware support for virtual memory protection. In this paper, we improve upon traditional page-level mechanisms by additionally using hardware support for virtualization in order to achieve fine and flexible granularities that can be smaller than a page. We first introduce a memory allocator based on page protection that can achieve fine-grained monitoring. Second, we explain how virtualization hardware support can be used to achieve dynamic adjustment of the monitoring granularity. In all, we propose a process-level virtual machine to achieve dynamic and fine-grained monitoring. Any application can run on our process-level virtual machine without modification. Experimental results for an incremental checkpoint tool provide a use-case to demonstrate our work. Comparing with traditional page-based checkpoint, our work can effectively reduce the amount of checkpoint data and improve performance.
Impact	http://delivery.acm.org/10.1145/3060000/3050751/p201-Lu.pdf
Start Year	2016


Company Name	AMANIEU SYSTEMS LTD
Description	Amanieu Systems Ltd builds on the research carried out at the University of Manchester on dynamic binary translator. The key product of Amanieu Systems is called Tango. Tango is a binary translation system for GNU/Linux and Android which allows unmodified 32-bit ARM programs to run on 64-bit only ARM processors. Tango supports the full 32-bit ARM and Thumb instruction sets, including floating-point (VFP) and SIMD (NEON) instructions. Tango can run all top 1000 Android apps and has a >99% pass rate on LTP tests. See more on compatibility Tango's performance running translated code is usually within 15% of native execution speed. See more on run-time efficiency Tango is production ready. Tango v1.0 was released in July 2019 and has been commercially deployed via early partner engagements.
Year Established	2017
Impact	The ARMv8 architecture introduced AArch64, a 64-bit execution mode with a new instruction set, while retaining binary compatibility with previous versions of the ARM architecture through AArch32, a 32-bit execution mode. Most hardware implementations of ARMv8 processors support both AArch32 and AArch64, which comes at a cost in hardware complexity. The product Tango is helping the main Arm-based chip development companies to evolve their architectures without requiring the AArch32 hardware.
Website	https://www.amanieusystems.com/

Abstract

Planned Impact

Organisations

People

ORCID iD

Publications