Efficient deep learning in constrained computing platforms

Lead Research Organisation: University of Oxford
Department Name: Computer Science


This project falls within the EPSRC Artiftcial Intelligence Technologies research area and is titled "Efficient deep learning in constrained computing platforms".

Since the success of deep neural networks in competitions such as ImageNet, a large part of machine learning research has been aimed at improving accuracy of deep neural nets. This has resulted in sophisticated models that perform very well in tasks such as computer vision, speech recognition and natural language processing.
Simultaneously, advances in the computing power of hardware platforms are allowing more complex algorithms to run on constrained platforms such as mobile and wearable devices. While these deep learning computations are more suited to run on GPUs due to their parallel nature, many leading hardware platforms already have released or will soon release hardware accelerators specifically designed for machine learning tasks.

These two powerful trends provide the potential for broad adaptation of on-device deep learning. However, little attention has been paid to practical requirements of deploying these accurate models to constrained real-time hardware platforms. In addition to model requirements in terms of inference time, these platforms have stringent requirements in terms of power consumption, memory footprint, latency, parallelisation, and cycle budget.

This project will seek to address some of these fundamental challenges that prevent deep neural nets from being widely adopted on mobile computing platforms. Within this context, our aim and objectives include, but are not limited to:
-Finding novel solutions to decompose existing deep learning models at runtime in order to make them execute more efficiently on hardware targets. Ideally these solutions should work automatically with any underlying model and should not require users to have knowledge of the model for manual tuning.
-Developing new deep learning algorithms designed from ground-up to run efficiently on constrained classes of computing platforms.
-Studying the effects of compression and quantisation on deep learning and ways to exploit the sparsity in neural nets.
-Finding new models and optimisations that can utilise machine learning hardware acceleration blocks.

This project relates to EPSRC's strategies focus on making machine learning more robust, resilient and transferable. It can potentially contribute to other domains such as human-computer interaction and social sciences by facilitating use of deep learning principles and algorithms towards a richer and more reliable understanding of user behaviour and context using mobile systems with sensory perception, reasoning and interaction capabilities.


10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/R512333/1 01/10/2017 30/09/2021
1894770 Studentship EP/R512333/1 01/10/2017 31/01/2022 Milad Alizadeh
Description There has been great interest in expanding usage of Deep Neural Networks from running remotely in cloud computers to running locally on small devices. Examples of such devices are mobile phones, smart watches, IoT devices, and robots. This is motivated by privacy implications of sharing data and models with remote machines, and the appetite to apply DNNs in new environments and scenarios where cloud-deployment is not viable. However, requirements of such devices are very demanding: there are stringent compute, storage, memory and bandwidth limitations; many applications need to work in real-time; many devices require long battery life for all-day or always-on use; and there is a thermal ceiling to consider. On the other hand, the quest for more accurate neural networks has resulted in deeper, more compute-intensive models.

A typical solution to deploy these models to such small devices is using low-precision (quantized) numbers to build models. This can even be pushed to the extreme by creating models that only use binary values of 0 and 1 as parameters. However, the algorithm used in training such models are not rigorous and include many hacks and tricks. In one of my published papers funded through this award I did in depth analysis of such algorithms. Our analysis disambiguated necessary from unnecessary ad-hoc techniques for training of such networks, paving the way for future development of solid theoretical foundations for these. Our newly-found insights further lead to new procedures which make training of existing binary neural networks notably faster.

Another typical problem in training these low-precision neural networks is that they target very specific setup. Deploying them to devices with different requirements often requires complete re-training of the model. In my second published paper I propose a novel approach that can train models that are inherently robust in any setup.

List of Publications:

Alizadeh, Milad, Arash Behboodi, Mart van Baalen, Christos Louizos, Tijmen Blankevoort, and Max Welling. "Gradient L1 Regularization for Quantization Robustness." In International Conference on Learning Representations, 2020."

Alizadeh, Milad, Javier Fernández-Marqués, Nicholas D Lane, and Yarin Gal. "An Empirical Study of Binary Neural Networks' Optimisation." International Conference on Learning Representations, 2019."

Fernández-Marqués, Javier, Milad Alizadeh, Vincent W-S Tseng, Sourav Bhattachara, and Nicholas D Lane. "On-the-Fly Deterministic Binary Filters for Memory Efficient Keyword Spotting Applications on Embedded Devices." In Proceedings of the 2nd International Workshop on Embedded and Mobile Deep Learning, 13-18. ACM, 2018."

Tseng, Vincent WS, Sourav Bhattachara, Javier Fernández-Marqués, Milad Alizadeh, Catherine Tong, and Nicholas D Lane. "Deterministic Binary Filters for Convolutional Neural Networks." International Joint Conferences on Artificial Intelligence Organization, 2018."
Exploitation Route The outcomes of this funding is particularly relevant to industries that are aiming to deploy machine learning models in constrained environments e.g. robots, cars, mobile phones etc. The papers published thanks for this funding shines light on pitfalls and possible solutions that applies when one considers compressing models.
Sectors Digital/Communication/Information Technologies (including Software),Electronics,Other