DOME: Delaying and Overcoming Microprocessor Errors

Lead Research Organisation: University of Manchester
Department Name: Computer Science

Abstract

Modern day computer systems have benefited from being designed and manufactured using an ever-increasing budget of transistors with very reliable integrated circuits. However, moving forward such a ''free lunch'' is over and forgotten nightmares faced by computer pioneers are coming back to haunt us. Not so long ago, unreliable valves were the basic building blocks for computers and research focussed on how to successfully compute, overcoming this underlying weakness (e.g. von Neuman, 1956, ''Probabilistic logics and the synthesis of reliable organisms from unreliable components'').

State-of-the-art integrated circuit technologies have now reached the range of 40-22 nanometers, posing significant reliability challenges. Hard or permanent errors can manifest themselves at any point during a processor's lifetime. During manufacturing, errors can render a proportion of a chip incapable of computing, thus decreasing yield and profit.
As we move towards smaller and smaller components, transistors take less and less time to wearout, becoming more prone to failure in the field. Traditional reliability solutions involve applying high-cost redundancy to the hardware structures within the processor, providing backup spares for when errors occur. On the application side, solutions also involve redundancy by running multiple copies of each piece of software.

A common criticism of current reliability solutions is that they do not consider how the software and hardware can be co-designed synergistically to tackle this challenge. Redesigning and reimplementing general purpose software applications will incur an unaffordable price tag. Our hypothesis is that virtualization technologies (a layer that transparently hides the underlying platform from the application software) have an important role to play. In particular, managed runtime environments (MREs) have become pervasive for high-productivity software developers and represent a promising vehicle for providing reliability mechanisms. Within these systems, applications can be monitored and morphed without user intervention.

There are two complementary strands to our proposed research, focused around a co-designed MRE and multicore computer architecture. Firstly, we will consider wearout mitigation schemes to slow processor ageing and lengthen a chip's lifetime before a hard fault occurs. Secondly, given that an error will occur at some point during a system's life, we will develop error-tolerance approaches that maintain execution on faulty hardware.

If successful, we believe this project will be seen as a significant milestone in the development of wearout-conscious and error-tolerant multicore architectures over the next decade. This research programme will advance our understanding of the field, tackling the UK Microelectronics Grand Challenge of Moore for Less that has been signposted by EPSRC. It is also important to highlight that this proposal tackles a key aspect of the new EPSRC ICT capability priority on "Many-core architectures and concurrency in distributed and embedded systems".

Planned Impact

State-of-the-art integrated circuit technologies have now reached the range of 45-22 nanometres, posing significant reliability challenges. Moving forward, multicore and many-core chips reliability is as a key design challenge for systems designers. DOME, our proposed research program, provides a transparent path forward for software applications. Managed runtime environments (MREs) have become pervasive for high-productivity software developers and represent a promising virtualisation vehicle for providing reliability mechanisms. Within these systems, applications can be monitored and morphed without application developer intervention.

Our objectives are focused around a co-designed MRE and computer architecture for multicore chips. We will consider wearout mitigation schemes to slow processor ageing and lengthen a chip's lifetime before a hard fault occurs. Given that an error will occur at some point during a system's life, we will develop error-tolerance approaches that maintain execution on faulty hardware.

If successful, we believe this project will be seen as a significant milestone in the development of wearout-conscious and error-tolerant multicore architectures over the next decade. This research programme will advance our understanding of the field, tackling the UK Microelectronics Grand Challenge of Moore for Less that has been signposted by EPSRC.

The main beneficiaries of this work can be broadly categorised into two groups. Firstly, academics stand to benefit through the development of our ideas into working prototypes. We are aiming to restructure software on-the-fly to adapt to changes in the underlying system's reliability. This is something that has never been done before and, if successful, will open up a new research area of dynamic optimisation for reliability.

The second group of beneficiaries is industry. We have secured strong letters of support from major companies, such as UK-based ARM, Microsoft and Qualcomm. They testify that our proposed research is both necessary to address the upcoming challenges that they face, and it also outlines novel solutions to the problems. Interactions with UK industry will allow our ideas to be fed into their research and development efforts in this area.

Furthermore, alongside the activities described above, at an early stage in the project we intend to identify the core IP generated from the research. With this in hand, we will evaluate the potential for further economic gain. Both universities have an excellent track record in commercial exploitation, Silistix and Transitive being two such examples.

As with any long-term research, it is complex to express its impact on the wider research community and society. However, EPSRC's vision for a digital economy is underpinned by the need for reliable electronics and systems. This proposal will significantly contribute towards this goal. Similarly, DOME will provide contribute towards the new EPSRC ICT capability theme on "many-core architectures and concurrency in distributed and embedded systems".
 
Description ARMOR is a hardware-solution to prevent Row Hammer Errors in DRAMs, designed and developed in the School of Computer Science at The University of Manchester. The main challenge to mitigate the Row-Hammer effect is to monitor the number of activations for each row in the DRAM, which imposes a significant storage overhead to the memory system. ARMOR monitors the activation stream at the memory interface level and detects which specific rows (i.e. hot rows) are at risk of being "hammered" at run-time. ARMOR is capable of detecting all the possible hot-rows in a system with a minimal storage overhead (e.g. 800 Bytes to protect 4 GB of DRAM).
Why ARMOR is a Promising Solution?
It is capable of detecting all the possible Row Hammer errors with a high level of confidence.
It provides precise information about the hammered rows (addresses) and the number of activations with a high level of accuracy.
- It does not need to know about the logical to physical mapping of DRAMs in order to mitigate Row Hammer error (ARMOR Cache Solution).
- It is scalable according to the size of memory.
- It is technology independent and can easily support future device technologies.
Exploitation Route UMIP (http://umip.com/) has provided funding to pay for the ARMOR patent, commercial viability studies and development of prototype with FPGAs/design boards.
Sectors Digital/Communication/Information Technologies (including Software)

URL http://apt.cs.manchester.ac.uk/projects/ARMOR/RowHammer/armor.html
 
Description We have open-sourced a dynamic binary modification tool for the ARM architecture (see http://github.com/beehive-lab/mambo). As the ARM architecture expands beyond its traditional embedded domain, there is a growing interest in dynamic binary modification (DBM) tools for general-purpose multicore processors that are part of the ARM family. Existing DBM tools for ARM suffer from introducing large overheads in the execution of applications. The specific questions that this article addresses are (i) how to develop such DBM tools for the ARM architecture and (ii) whether new optimisations are plausible and needed. We describe the general design of MAMBO, a new DBM tool for ARM, which we release together with this publication [1], and introduce novel optimisations to handle indirect branches. In addition, we explore scenarios in which it may be possible to relax the transparency offered by DBM tools to allow extra optimisations to be applied. These scenarios arise from analysing the most typical usages: for example, application binaries without handcrafted assembly. The performance evaluation shows that MAMBO introduces small overheads for SPEC CPU2006 and PARSEC 3.0 when comparing with the execution times of the unmodified programs. [1] MAMBO: A Low-Overhead Dynamic Binary Modification Tool for ARM C Gorgovan, A d'Antras, M Luján ACM Transactions on Architecture and Code Optimization 13 (1), 14
Sector Digital/Communication/Information Technologies (including Software)
Impact Types Economic

 
Title MAMBO - A low-overhead dynamic binary instrumentation and modification tool 
Description As the ARM architecture expands beyond its traditional embedded domain, there is a growing interest in dynamic binary modification (DBM) tools for general-purpose multicore processors that are part of the ARM family. Existing DBM tools for ARM suffer from introducing large overheads in the execution of applications. The specific questions that this article addresses are (i) how to develop such DBM tools for the ARM architecture and (ii) whether new optimisations are plausible and needed. We describe the general design of MAMBO, a new DBM tool for ARM, which we release together with this publication, and introduce novel optimisations to handle indirect branches. In addition, we explore scenarios in which it may be possible to relax the transparency offered by DBM tools to allow extra optimisations to be applied. These scenarios arise from analysing the most typical usages: for example, application binaries without handcrafted assembly. The performance evaluation shows that MAMBO introduces the smallest published overheads for the ARM architecture. 
Type Of Material Improvements to research infrastructure 
Year Produced 2016 
Provided To Others? Yes  
Impact The MAMBO tool is the fastest and most complete Dynamic Binary Modification and Instrumentation tool for Arm architectures. Researchers are using MAMBO to build architectural simulators as well as security tools to analysis Arm binaries. 
URL https://github.com/beehive-lab/mambo
 
Title Simulation tool MaxSim 
Description Managed applications, written in programming languages such as Java, C# and others, represent a significant share of workloads in the mobile, desktop, and server domains. Microarchitectural timing simulation of such workloads is useful for characterization and performance analysis, of both hardware and software, as well as for research and development of novel hardware extensions.This paper introduces MaxSim, a simulation platform based on the Maxine VM, the ZSim simulator, and the McPAT modeling framework. MaxSim is able to simulate fast and accurately managed workloads running on top of Maxine VM and its capabilities are showcased with novel simulation techniques for:1) low-intrusive microarchitectural profiling via pointer tagging on the x86-64 platforms, 2) modeling of hardware extensions related, but not limited to, tagged pointers, and 3) modeling of complex software changes via address-space morphing.Low-intrusive microarchitectural profiling is achieved by utilizing tagged pointers to collect type- and allocation-site- related hardware events. Furthermore, MaxSim allows, through a novel technique called address space morphing, the easy modeling of complex object layout transformations. Finally, through the co-designed capabilities of MaxSim, novel hardware extensions canbe implemented and evaluated.We showcase MaxSim's capabilities by simulating the wholeset of the DaCapo-9.12-bach benchmarks in less than a day while performing an up-to-date microarchitectural power and performance characterization. Furthermore, we demonstrate a hardware/software co-designed optimization that performs dynamic load elimination for array length retrieval achieving up to14% L1 data cache loads reduction and up to 4% dynamic energy reduction. 
Type Of Material Improvements to research infrastructure 
Year Produced 2017 
Provided To Others? Yes  
Impact Best paper award in the conference dedicated to simulation platforms 
URL https://github.com/beehive-lab/MaxSim
 
Description Flexible Page-level Memory Access Monitoring Based on Virtualization Hardware 
Organisation National University of Defense Technology
Country China 
Sector Academic/University 
PI Contribution Page protection is often used to achieve memory access monitoring in many applications, dealing with programanalysis, checkpoint-based failure recovery, and garbage collection in managed runtime systems. Typically, low overhead access monitoring is limited by the relatively large page-level granularity of memory management unit hardware support for virtual memory protection. In this paper, we improve upon traditional page-level mechanisms by additionally using hardware support for virtualization in order to achieve fine and flexible granularities that can be smaller than a page. We first introduce a memory allocator based on page protection that can achieve fine-grained monitoring. Second, we explain how virtualization hardware support can be used to achieve dynamic adjustment of the monitoring granularity. In all, we propose a process-level virtual machine to achieve dynamic and fine-grained monitoring. Any application can run on our process-level virtual machine without modification. Experimental results for an incremental checkpoint tool provide a use-case to demonstrate our work. Comparing with traditional page-based checkpoint, our work can effectively reduce the amount of checkpoint data and improve performance.
Collaborator Contribution Page protection is often used to achieve memory access monitoring in many applications, dealing with programanalysis, checkpoint-based failure recovery, and garbage collection in managed runtime systems. Typically, low overhead access monitoring is limited by the relatively large page-level granularity of memory management unit hardware support for virtual memory protection. In this paper, we improve upon traditional page-level mechanisms by additionally using hardware support for virtualization in order to achieve fine and flexible granularities that can be smaller than a page. We first introduce a memory allocator based on page protection that can achieve fine-grained monitoring. Second, we explain how virtualization hardware support can be used to achieve dynamic adjustment of the monitoring granularity. In all, we propose a process-level virtual machine to achieve dynamic and fine-grained monitoring. Any application can run on our process-level virtual machine without modification. Experimental results for an incremental checkpoint tool provide a use-case to demonstrate our work. Comparing with traditional page-based checkpoint, our work can effectively reduce the amount of checkpoint data and improve performance.
Impact http://delivery.acm.org/10.1145/3060000/3050751/p201-Lu.pdf
Start Year 2016
 
Company Name AMANIEU SYSTEMS LTD 
Description Amanieu Systems Ltd builds on the research carried out at the University of Manchester on dynamic binary translator. The key product of Amanieu Systems is called Tango. Tango is a binary translation system for GNU/Linux and Android which allows unmodified 32-bit ARM programs to run on 64-bit only ARM processors. Tango supports the full 32-bit ARM and Thumb instruction sets, including floating-point (VFP) and SIMD (NEON) instructions. Tango can run all top 1000 Android apps and has a >99% pass rate on LTP tests. See more on compatibility Tango's performance running translated code is usually within 15% of native execution speed. See more on run-time efficiency Tango is production ready. Tango v1.0 was released in July 2019 and has been commercially deployed via early partner engagements. 
Year Established 2017 
Impact The ARMv8 architecture introduced AArch64, a 64-bit execution mode with a new instruction set, while retaining binary compatibility with previous versions of the ARM architecture through AArch32, a 32-bit execution mode. Most hardware implementations of ARMv8 processors support both AArch32 and AArch64, which comes at a cost in hardware complexity. The product Tango is helping the main Arm-based chip development companies to evolve their architectures without requiring the AArch32 hardware.
Website https://www.amanieusystems.com/