PAMELA: a Panoramic Approach to the Many-CorE LAndsape - from end-user to end-device: a holistic game-changing approach

Lead Research Organisation: University of Manchester
Department Name: Computer Science


The last decade has seen a significant shift in the way computers are designed. Up to the turn of the millennium advances in performance were achieved by making a single processor, which could execute a single program at a time, go faster, usually by increasing the frequency of its clock signal. But shortly after the turn of the millennium it became clear that this approach was running into a brick wall - the faster clock meant the processor got hotter, and the amount of heat that can be dissipated in a silicon chip before it fails is limited; that limit was approaching rapidly!

Quite suddenly several high-profile projects were cancelled and the industry found a new approach to higher performance. Instead of making one processor go ever faster, the number of processor cores could be increased. Multi-core processors had arrived: first dual core, then quad-core, and so on. As microchip manufacturing capability continues to increase the number of transistors that can be integrated on a single chip, the number of cores continues to rise, and now multi-core is giving way to many-core systems - processors with 10s of cores, running 10s of programs at the same time.

This all seems fine at the hardware level - more transistors means more cores - but this change from one to many programs running at the same time has caused many difficulties for the programmers who develop applications for these new systems. Writing a program that runs on a single core is much better understood than writing a program that is actually 10s of programs running at the same time, interacting with each other in complex and hard-to-predict ways. To make life for the programmer even harder, with many-core systems it is often best not to make all the cores identical; instead, heterogeneous many-core systems offer the promise of much higher efficiency with specialised cores handling specialised parts of the overall program, but this is even harder for the programmer to manage.

The Programme of projects we plan to undertake will bring the most advanced techniques in computer science to bear on this complex problem, focussing particularly on how we can optimise the hardware and software configurations together to address the important application domain of 3D scene understanding. This will enable a future smart phone fitted with a camera to scan a scene and not only to store the picture it sees, but also to understand that the scene includes a house, a tree, and a moving car. In the course of addressing this application we expect to learn a lot about optimising many-core systems that will have wider applicability too, and the prospect of making future electronic products more efficient, more capable, and more useful.

Planned Impact

The driving force motivating this project is to make an impact. There are 10 activities through which the potential economic, societal and academic impacts of this project will be realised.

1. Demonstrator systems. To ensure our research has the potential to deliver its promised impact to industry, we will build demonstrator systems capable of showcasing the ideas underpinning our work. Demonstrators will include a vision pipeline on a smartphone, an energy-efficient compiler and virtual machine, and a silicon prototype of processor IP.

2. Innovation workshop. In the final 2 years of the project we will organise 2 workshops through HiPEAC EU Network of Excellence, to disseminate our research to international academics and industry. Such an event typically attracts between 200 and 500 people.

3. Engaging with industrial partners. There are several industrial partners in this project. Their roles are to provide access to specific technologies, where appropriate, and to advise on exploitation and impact potential as the project develops. Where appropriate we will begin steps toward commercialisation via proof of concept pilot projects. We already have this in place with some of our partners.

4. Technology licensing. We will leverage our expertise in this area to ensure that new technologies emerging from our research are transferred to industry through licensing, unless direct spin-out activities are preferred.

5. Spin-out companies. This provides a major pathway to exploitation, commercialising the vision pipeline, processor IP, design tools and JIT technology. If the research in this project is successful, the main pathway to industrial exploitation would be through the formation of a spin-out company based on experience gained from the demonstrator system. We have substantially experience in in KT, have access to over a dozen high-level industrialists and seed investors to guide spinout formation.
Each of the Universities has a professional team in place to enable spinouts.

6. Public engagement. We will continue to use online media to communicate to the academic community and the general public. Our web site will includes copies of all papers and reports, video presentations on our work, and a collection of press releases.

7. Postgraduate skills and training. We believe this project has an important role to play in developing and sustaining the UK skill base in heterogeneous many-cores It is
clear that there is a dire need in UK industry for suitably qualified engineers. This project will educate the next generation of doctoral students. In addition the industrial sponsored PhDs will also have access to mentors within the partner companies to provide advice and feedback on research plans.

8. Research publications. To maximize the academic impact we shall publish our research in the most respected journals and the top conferences in the area. Where permitted, research outputs will be made available on the project web site.

9. Distributing demonstrator systems. Elements of the demonstrator platform will be offered to academic collaborators, in order to stimulate exchange of ideas and generate wider academic impact. In particular we will promote these within the HiPEAC low power eco-system activity managed by Michael O'Boyle and Emre Ozer, ARM.

10. Academic exchange visits. To maximize the academic benefit of our work we shall promote bilateral meetings between our group and other groups in the UK working on relevant and related topics.
Description The high-level objective of the research is to further research at each level in the vertical stack from languages through compilers and run-time systems down to hardware support specifically targeting future Computer Vision applications for low-power devices, and also for data-centre applications.

The project objectives have, with a year or so remaining, largely been met and, across the three institutions involved (Edinburgh, Imperial and Manchester) we have contributed significant new results at each of the above levels, all supported by research papers published in top journals and/or conferences.

Our primary contributions involve:
- The production of SLAMBench which provides a standardised process for evaluating SLAM systems for the community;
- Developing design space exploration techniques and software to support rapid development of efficient SLAM systems for specific target architectures and the exploration of appropriate hardware accelerators.
- Research into efficient and accurate FPGA-accelerated simulation technologies to support the exploration and development of new, ARM-based, chip designs and kernel-specific accelerators, initially for SLAM systems with the anticipation of integrating Object Recognition technologies in the future.
- The development of binary optimisation software, MAMBO, which is used in a simulation platform, and MAMBO-64.
- The porting and extension of the MAXINE Java managed run-time environment research platform to ARM-v7 platforms.
- New, efficient compilation and scheduling algorithms (both compile-time and run-time for traditional languages (e.g. C++ and managed environments (i.e. Java).
Exploitation Route SLAMbench has already found external users.
Sectors Digital/Communication/Information Technologies (including Software)

Description The development and open-source release of SLAMBench has had an impact in the Computer Vision community. The original paper has 32 citations currently and the software has been downloaded by a large number of academic institutions and companies. This has contributed to the associated design-space exploration activities in the project and wider Computer Vision community. A version of the k-fusion SLAM system has been successfully executed on a mobile phone, satisfying one of the key initial challenges set by the project and developing compilation and run-time techniques that are more widely applicable. With around one year to go in the project, the research in simulation technologies is reaching maturity. A version of the MAMBO-64 binary optimisation software has been successfully licenced to a company. This enables ARM-32 bit applications to run directly on ARM-64 bit platforms obviating the need for hardware support for 32-bit applications. The simulation platforms are being used to explore the design of hardware accelerators to support SLAM applications in low-power devices, with the anticipated development of IP. The MAXINE managed run-time environment (MRE) has been ported to ARM-v7 (with ARM-v8 to come), providing the only high-quality research MRE with a industrial strength JIT compiler to the community.
First Year Of Impact 2015
Sector Digital/Communication/Information Technologies (including Software)
Description ACTiCLOUD - ACTivating resource efficiency and large databases in the CLOUD
Amount € 4,733,532 (EUR)
Funding ID 732366 
Organisation European Commission H2020 
Sector Public
Country Belgium
Start 01/2017 
End 12/2019
Description E2DATA - European Extreme Performing Big Data Stacks
Amount € 4,676,250 (EUR)
Funding ID 780245 
Organisation European Commission H2020 
Sector Public
Country Belgium
Start 01/2018 
End 12/2020
Description Robotics and Artificial Intelligence for Nuclear (RAIN)
Amount £12,203,190 (GBP)
Funding ID EP/R026084/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 09/2017 
End 04/2021
Title GenSim 
Description GenSim is a fully integrated simulation generation system, and accompanying high-speed simulator. 
Type Of Material Improvements to research infrastructure 
Year Produced 2018 
Provided To Others? Yes  
Impact The GenSim toolchain can produce modules for ArcSim and Captive, tools which are capable of executing guest instructions at >1000MIPS for both User-mode and Full-System contexts. 
Title MAMBO - A low-overhead dynamic binary instrumentation and modification tool 
Description As the ARM architecture expands beyond its traditional embedded domain, there is a growing interest in dynamic binary modification (DBM) tools for general-purpose multicore processors that are part of the ARM family. Existing DBM tools for ARM suffer from introducing large overheads in the execution of applications. The specific questions that this article addresses are (i) how to develop such DBM tools for the ARM architecture and (ii) whether new optimisations are plausible and needed. We describe the general design of MAMBO, a new DBM tool for ARM, which we release together with this publication, and introduce novel optimisations to handle indirect branches. In addition, we explore scenarios in which it may be possible to relax the transparency offered by DBM tools to allow extra optimisations to be applied. These scenarios arise from analysing the most typical usages: for example, application binaries without handcrafted assembly. The performance evaluation shows that MAMBO introduces the smallest published overheads for the ARM architecture. 
Type Of Material Improvements to research infrastructure 
Year Produced 2016 
Provided To Others? Yes  
Impact The MAMBO tool is the fastest and most complete Dynamic Binary Modification and Instrumentation tool for Arm architectures. Researchers are using MAMBO to build architectural simulators as well as security tools to analysis Arm binaries. 
Title Simulation tool MaxSim 
Description Managed applications, written in programming languages such as Java, C# and others, represent a significant share of workloads in the mobile, desktop, and server domains. Microarchitectural timing simulation of such workloads is useful for characterization and performance analysis, of both hardware and software, as well as for research and development of novel hardware extensions.This paper introduces MaxSim, a simulation platform based on the Maxine VM, the ZSim simulator, and the McPAT modeling framework. MaxSim is able to simulate fast and accurately managed workloads running on top of Maxine VM and its capabilities are showcased with novel simulation techniques for:1) low-intrusive microarchitectural profiling via pointer tagging on the x86-64 platforms, 2) modeling of hardware extensions related, but not limited to, tagged pointers, and 3) modeling of complex software changes via address-space morphing.Low-intrusive microarchitectural profiling is achieved by utilizing tagged pointers to collect type- and allocation-site- related hardware events. Furthermore, MaxSim allows, through a novel technique called address space morphing, the easy modeling of complex object layout transformations. Finally, through the co-designed capabilities of MaxSim, novel hardware extensions canbe implemented and evaluated.We showcase MaxSim's capabilities by simulating the wholeset of the DaCapo-9.12-bach benchmarks in less than a day while performing an up-to-date microarchitectural power and performance characterization. Furthermore, we demonstrate a hardware/software co-designed optimization that performs dynamic load elimination for array length retrieval achieving up to14% L1 data cache loads reduction and up to 4% dynamic energy reduction. 
Type Of Material Improvements to research infrastructure 
Year Produced 2017 
Provided To Others? Yes  
Impact Best paper award in the conference dedicated to simulation platforms 
Description A R M Ltd 
Organisation Arm Limited
Country United Kingdom 
Sector Private 
Start Year 2006
Description Agilent 
Organisation Agilent Technologies
Country United States 
Sector Private 
PI Contribution E
Collaborator Contribution E
Impact E
Start Year 2013
Description Dyson 
Organisation Dyson
Country United Kingdom 
Sector Private 
PI Contribution K
Collaborator Contribution K
Impact K
Start Year 2013
Description Foundry 
Organisation The Foundry Visionmongers Ltd
Country United Kingdom 
Sector Private 
PI Contribution F
Collaborator Contribution F
Impact F
Start Year 2013
Description H-P 
Organisation Hewlett Packard Ltd
Country United Kingdom 
Sector Private 
PI Contribution H
Collaborator Contribution H
Impact H
Start Year 2013
Description Imagination 
Organisation Imagination Technologies
Country United Kingdom 
Sector Private 
PI Contribution L
Collaborator Contribution L
Impact L
Start Year 2013
Description Microsoft 
Organisation Microsoft Research
Department Microsoft Research Cambridge
Country United Kingdom 
Sector Private 
PI Contribution G
Collaborator Contribution G
Impact G
Start Year 2013
Description Oracle 
Organisation Oracle Corporation
Department Oracle Corporation UK Ltd
Country United Kingdom 
Sector Private 
PI Contribution J
Collaborator Contribution J
Impact J
Start Year 2013
Description Wolfson 
Organisation Wolfson Microelectronics
Country United Kingdom 
Sector Private 
PI Contribution D
Collaborator Contribution D
Impact D
Start Year 2006
Description samsung 
Organisation Samsung
Department Samsung Advanced Institute of Technology
Country Korea, Republic of 
Sector Private 
PI Contribution M
Collaborator Contribution M
Impact M
Start Year 2013
Title Tornado Virtual Machine 
Description The Tornado VM is a practical heterogeneous programming framework for automatically accelerating Java programs on heterogeneous (OpenCL-compatible) hardware. 
Type Of Technology Software 
Year Produced 2018 
Open Source License? Yes  
Impact The system is in use for data analytics research projects at various UK universities. 
Description Amanieu Systems Ltd builds on the research carried out at the University of Manchester on dynamic binary translator. The key product of Amanieu Systems is called Tango. Tango is a binary translation system for GNU/Linux and Android which allows unmodified 32-bit ARM programs to run on 64-bit only ARM processors. Tango supports the full 32-bit ARM and Thumb instruction sets, including floating-point (VFP) and SIMD (NEON) instructions. Tango can run all top 1000 Android apps and has a >99% pass rate on LTP tests. See more on compatibility Tango's performance running translated code is usually within 15% of native execution speed. See more on run-time efficiency Tango is production ready. Tango v1.0 was released in July 2019 and has been commercially deployed via early partner engagements. 
Year Established 2017 
Impact The ARMv8 architecture introduced AArch64, a 64-bit execution mode with a new instruction set, while retaining binary compatibility with previous versions of the ARM architecture through AArch32, a 32-bit execution mode. Most hardware implementations of ARMv8 processors support both AArch32 and AArch64, which comes at a cost in hardware complexity. The product Tango is helping the main Arm-based chip development companies to evolve their architectures without requiring the AArch32 hardware.