Application Customisation: Enhancing Design Quality and Developer Productivity

Lead Research Organisation: Imperial College London
Department Name: Dept of Computing

Abstract

There have not been many shake-ups in mainstream processor architectures, since von Neumann articulated their basic principles in 1945 and Hoff developed the microprocessor architecture in 1969. This is changing: field programmable technology has been adopted by major companies such as Microsoft and Intel for datacentre computing, and new architectures are expected which integrate processor cores and field programmable resources on the same chip. These developments are largely motivated by improvements in performance and energy efficiency of field programmable technology, which are so promising that industrial adoption takes place despite the significant challenge of developing applications for custom computing systems based on field programmable technology.

Our vision is to address this challenge by advancing the foundation and applications of customisation, which involves developing hardware and software to fit design requirements. The proposed Platform project aims to pioneer new capabilities for enhancing design quality and designer productivity of custom computing systems, with potential to revolutionise many applications including those with needs for big data processing or for improved reliability and security. It builds on success of disruptive research funded by our previous Platform (EP/I012036/1).

An example of such success is research in runtime reconfiguration of custom computing systems: we developed new analysis methods to enable reconfiguration to remove idle functions; we showed how reconfiguration can benefit many applications such as genomic data processing and finite-difference computation. Our work is disruptive since, in contrast to current focus on partial reconfiguration, it demonstrates that full reconfiguration can provide significant energy-efficient acceleration over conventional multicore and manycore processors reducing, for example, runtime of Bisulfite sequence alignment from hours to minutes for non-invasive prenatal and cancer diagnosis. Moreover, we invented the first field programmable architecture capable of single-cycle on-chip configuration generation, while current commercial devices are based on off-chip configuration generation that can take hours.

Such exciting progress is only possible because the Platform Grant enabled high-risk research by researchers who would otherwise suffer from funding gaps: 12 Research Associates in our team enjoyed Platform support before they found permanent positions. Renewed Platform support will allow continuing development of our dynamic and ambitious research team to explore next-generation computer systems and their applications.

The flexibility of the renewed Platform Grant will be used to address three new strategic areas, on which we are uniquely capable of making major impacts; we will conduct exploratory research to identify promising projects for responsive mode or other forms of funding:

1. Multi-level tradeoff-aware design automation, which includes investigating customisation strategies and the associated tradeoffs, automation of effective customisation strategies, and developing reusable demonstration facilities and testbeds.

2. Reconfigurable big data and cloud architectures, which include customisable big data processing, runtime design generation and optimisation, and domain-specific cloud optimisation.

3. Reliable system development life cycle, which includes codesign of reliable and resilient systems, high-coverage testing and verification strategies, and reliability and resilience life cycle management.

The added-value aspects for this Platform Grant proposal include: (a) ensuring a critical mass of researchers in key areas, (b) exploring significant strategic areas, (c) contributing to research infrastructure, (d) attracting fresh talents, (e) pioneering and strengthening international collaborations, and (f) accelerating technology transfer.

Planned Impact

Energy-efficient acceleration with custom computing is a critical technology which can benefit:

1. the society, facing global challenges such as climate change, healthcare and security

2. organisations with products or services based on high performance computer systems such as CHREC and Maxeler, and cloud service providers such as Microsoft

3. FPGA vendors such as Altera and Xilinx

4. related silicon device vendors, such as ARM and Imagination Technologies

5. companies with products or services which rely on systems that would benefit from the above devices, such as Moortec and ThoughtWorks

6. individuals or organisations who use such products or services, especially those who would benefit from enhanced reliability

7. the Research Associates working on this project and students who work on related projects

8. students and others studying related courses e.g. hardware design, high-performance computing, embedded systems

This project has significant potential for 3 kinds of transformational impact:

(i) novel computer systems with energy-efficient acceleration, and their tools,
(ii) improved productivity of their designers and users, and
(iii) new or improved applications and services enabled by them. Hence:

(a) The society will benefit from better understanding and modelling of climate change, and from enhanced healthcare and security

(b) Companies with products or services relying on high-performance computing systems would be able to offer more powerful systems with enhanced security in a shorter time and at a lower cost

(c) FPGA vendors will benefit from more efficient architectures and from higher designer productivity

(d) Other silicon vendors will benefit from better prototyping capabilities, and to adapt particular techniques (e.g. those related to energy reduction or security enhancement) where applicable

(e) Companies with products or services relying on embedded systems would be able to speed up implementing better and cheaper real-time systems with lower energy and enhanced security

(f) Users of such products or services would be able to enjoy improvements in a more timely manner and at a lower price

(g) Environment will benefit from reduced energy usage; society will benefit from improved reliability

In addition, FPGA technology has potential to benefit many more applications. Examples include improving:

(a) the internet by making it more efficient and secure through FPGA-based message routers and intrusion detection engines

(b) cloud computing systems by significantly reducing their power and energy consumption, and need for cooling

(c) healthcare provision by accelerating, for instance, medical robotics

(d) scientific understanding through experimental facilities such as the Large Hadron Collider at CERN

(e) simulation facilities for a wide range of applications, from chip design to climate effects to gaming, by lowering the efforts of prototyping such systems

We will work with our Project Partners and Visiting Researchers to take into account their suggestions for key challenges that next-generation systems would need to meet, so that the project can produce useful results as soon and as much as possible. We will also explore dissemination and use of the project results for a wide range of research and development efforts, together with initial exploitation measures either through the project industrial partners, or through exploitation routes recommended by Imperial Innovations.

The means of dissemination includes publishing papers in relevant journals and conferences, providing a project web portal with access to publications and open-source tools and benchmarks, developing tutorial and teaching material, and liaising with related projects.
 
Description 1. A new deconvolution architecture for efficient FPGA implementation. FPGA-based accelerators are proposed for both deconvolutional and convolutional neural network (CNN) (CNN) algorithms. A non-linear optimization model based on the performance model is introduced to efficiently explore the design space in order to achieve optimal processing speed of the system and improve power efficiency. On Xilinx Zynq ZC706 board, the proposed deconvolution accelerator achieves a performance of 90.1 GOPS under 200MHz working frequency and a performance density of 0.10 GOPS/DSP using 32-bit quantization, which significantly outperforms previous designs on FPGAs.

2. CROSSBOW is a new single-server multi-GPU system for training deep learning models that enables choice of batch size while scaling to multiple GPUs. CROSSBOW
uses many parallel model replicas and avoids reduced statistical efficiency through a new synchronous training method. CROSSBOW achieves high hardware efficiency with small
batch sizes by potentially training multiple model replicas per GPU, automatically tuning the number of replicas to maximise throughput. Experiments show that CROSSBOW improves the training time of deep learning models on an 8-GPU server by 1.3-4x compared to TensorFlow.
Exploitation Route 1. Organisations with products or services based on high performance computer systems, and cloud service providers such as Microsoft, especially those based on deep learning technologies.

2. FPGA vendors such as Intel and Xilinx, and related silicon device vendors, such as ARM and Imagination Technologies; also companies with products or services which rely on systems that would benefit from the above devices.

3. Researchers, including the Research Associates working on this project and students who work on related projects, working on high-performance systems related to deep learning.
Sectors Digital/Communication/Information Technologies (including Software)