Deep Learning Networks for Multi-Sensory Perception and Control with applications to Cyber-Physical Systems

Lead Research Organisation: University of Oxford
Department Name: Computer Science

Abstract

This project falls within the EPSRC Artificial Intelligence Technologies research area

Cyber-physical Systems (CPS) are networks characterised by a seamless integration of software and physical components. Ranging their applicability to a broad group of areas like transportation, healthcare and manufacturing, the data these systems process is present in an equally diverse range of forms: audio, text, infrared and ultrasound signals, and a variety of physical magnitudes measurements such as temperature, humidity, speed and inertia. Along with all those, visual perception is also an important dimension of many CPS aiming to monitor and react to changes in the physical world.

Computer vision has had its own revolution in the past five years due to the usage deep convolutional neural network for a wide range of image based applications such as image classification, segmentation and super-resolution. With the increase in compute capabilities of GPUs (Graphic Processing Units) together with advances in machine learning and deep learning in particular, richer models are being designed to aggregate different types of data making them more suitable to monitor complex domains of today's society. This scenario presents novel and mobile in nature CPS that offer unparalleled perception, reasoning and interaction capabilities.

For decades, the field of machine learning has been fixated on goals such as accuracy and robustness. In recent years, deep learning has drawn a lot of attention and it has been proven to be effective in a number fields even outperforming humans in certain tasks. However, systems running those algorithms often require considerable memory, compute and energy resources which prevent them to run locally on smaller devices and therefore being almost exclusively consigned to cloud computing setups when used in mobile applications. The world is increasingly recognising the importance that efficiency will play in pushing AI systems forward. Therefore, the main research objectives include (but are not limited to) the following:

Development of efficient networks architectures for computer vision tasks aiming to achieve comparable results to standard networks while drastically reducing the memory and computation resources needed during inference. Existing architectures already deliver state of the art performance on popular, yet relatively simple, datasets like CIFAR-10, NMIST and SVHN. Their suitability for other tasks beyond image classification such as segmentation or 3D perception has not been studied in depth.

Develop novel learning strategies using multi-sensorial data for resource constrained autonomous or semi-autonomous machines. While it has been shown that robots can be designed to excel at specific tasks, general purpose robots are still very limited. The popular DARPA Robotic challenges are a good example of how the execution of several relatively simple tasks (e.g. open a door, walk over irregular terrain) are incredibly complex to program. Recent works in sensor fusion show how deep learning provides a mechanism to fuse different signals making them suitable for robot control.

This project focuses in improving the accuracy, efficiency and portability of deep learning systems with an emphasis on, but not limited to, those that seek to gain some understanding through images. We believe that the systems developed from this research project would be transferable to several domains including robotics, autonomous driving, IoT and smart cities.

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/N509711/1 01/10/2016 30/09/2021
1917012 Studentship EP/N509711/1 01/10/2017 31/07/2021 Javier Fernandez-Marques
 
Description Any device that requires significant power faces a lot of barriers when it comes to its commercialisation. The advantages that smart products have to offer often get outweighed due to power-related inconvenient installation and maintenance processes, whether because they require permanent wiring or frequent docking to charge the battery. In low-end CPUs and microcontrollers (MCUs), data movement is the primary source of energy consumption. There are two main ways of alleviating these costs while preserving the deep learning model (i.e. the algorithm) unchanged: model quantisation and model compression.

Quantisation: Quantisation allows to reduce the number of bits needed to represent each parameter, and therefore the amount of data movement. The most extreme form of quantisation is binary quantisation, which represents parameters with either "+1" or "-1", effectively reducing model size by up to 32x. We presented a study of the challenges associated to the training of Binary Networks. There is little consensus on what ad-hoc techniques are important to train binary networks. In Alizadeh et al (2019) we empirically identified the necessary techniques and provide a recipe to train these networks faster.

Model compression: In Tseng et al (2018) and Fernandez-Marques et al (2018) works, we present a framework to learn models where the weights are a combination of deterministic binary codes. During inference, our model generates the weights on-the-fly (as opposed to load them from disk or RAM), therefore alleviating the problem of data movement and its impact in energy and latency.

This research was primarily validated for image and audio classification applications. However, we believe the techniques developed are generally applicable to most deep learning applications. This research also contributed to obtaining a research internship at Arm where I optimised a convolutional algorithm to make use of quantisation, Fernandez-Marques et al (2020).


Conference papers:

Javier Fernandez-Marques, Paul N. Whatmough, Andrew Mundy and Matthew Mattina. Searching for Winograd-aware Quantized Networks. Conference on Machine Learning and Systems (MLSys), 2020

Milad Alizadeh, Javier Fernandez-Marques, Nicholas D. Lane and Yarin Gal. A Systematic Study of Binary Neural Networks' Optimisation. International Conference on Learning Representations (ICLR), 2019

Javier Fernandez-Marques, Vincent W.-S. Tseng, Sourav Bhattacharya and Nicholas D. Lane. BinaryCmd: Keyword Spotting with deterministic binary basis. Conference on Systems and Machine Learning (SysML), 2018

Vincent W.-S. Tseng, Sourav Bhattacharya, Javier Fernandez-Marques, Milad Alizadeh, Catherine Tong and Nicholas D. Lane. Deterministic Binary Filters for Convolutional Neural Networks. International Joint Conference on Artificial Intelligence (IJCAI), 2018
Exploitation Route As more and more deep learning applications are being developed, it becomes crucial that these are designed in a hardware-aware fashion. This would make DL applications better utilise the hardware they're running on. The outcomes of our research present different ways of accomplishing this by making use of extreme quantisation and reducing data movement. In our research we have consider existing off-the-shelf platforms (i.e. Arm Cortex-A and Cortex-M devices) to validate our findings. However, we believe the co-design of software and hardware will play an important roll in making DL widely deployed in the world.
Sectors Digital/Communication/Information Technologies (including Software),Electronics