MOA: High Efficiency Deep Learning for Embedded and Mobile Platforms (Full EPSRC Fellowship Submission)

Lead Research Organisation: University of Oxford

Department Name: Computer Science

Abstract

In just a few short years, breakthroughs from the field of deep learning have transformed how computers perform a wide-variety of tasks such as recognizing a face, tracking emotions or monitoring physical activities. Unfortunately, the models and algorithms used by deep learning typically exert severe energy, memory and compute demands on local device resources and this conventionally limits their adoption within mobile and embedded devices. Data perception and understanding tasks powered by deep learning are so fundamental to platforms like phones, wearables and home/industrial sensors, that we must reach a point where current -- and future -- innovations in this area can be simply and efficiently integrated within even such resource constrained systems. This research vector will lead directly to outcomes like: brand new types of sensor-based products in the home/workplace, as well as enabling increasing the intelligence within not only consumer devices, but also in fields like medicine (smart stethoscopes) and anonymous systems (robotics/drones).

The MOA fellowship aims to fund basic research, development and eventual commercialization (through collaborations with a series of industry partners) algorithms that aims to enable general support for deep learning techniques on resource-constrained mobile and embedded devices. Primarily, this requires a radical reduction in the resources (viz. energy, memory and computation) consumed by these computational models -- especially at inference (i.e., execution) time. The proposal seeks will have two main thrusts. First, build upon the existing work of the PI in this area towards achieving this goal which includes: sparse intra-model layer representations (resulting in small models), dynamic forms of compression (models that can be squeezed smaller or bigger as needed), and scheduling partitioned model architectures (splitting models and running parts of them on the processor that suits that model fraction best on certain processors found inside a mobile/embedded device). This thrust will re-examine these methods towards solving key remaining issues that would prevent such techniques from being used within products and as part of common practices. Second, investigate a new set of ambitious directions that seek to increase the utilization of emerging purpose-built small-form-factor hardware processor accelerators designed for deep learning algorithms (these accelerators are suitable for use within phones, wearables and drones). However, like any piece of hardware, it is still limited by how it is programmed - and software toolchains that map deep learning models to the accelerator hardware remain infancy. Our preliminary results show that existing approaches to optimizing deep models, conceived first for conventional processors (e.g., DSPs, GPUs, CPUs), poorly use the new capabilities of these hardware accelerators. We will examine the development of important new approaches that modify the representation and inference algorithms used within deep learning so that they can fully utilize the new hardware capabilities. Directions include: mixed precision models and algorithms, low-data movement representations (that can trade memory operations for compute), and enhanced parallelization.

Planned Impact

Expanded discussion upon aspects of how MOA relates to national importance are contained within the Pathways to Impact document. But here summarize in the following paragraphs, the core aspects of importance related to specific EPSRC priority areas, the broader economy, societal impact and contributions to knowledge.
MOA fits within the 'Robotics and artificial intelligence systems' priority area of the UKRI call. It aims to produce technology for enabling machine learning models that otherwise need to run remotely in the cloud, to instead run directly within mobile and embedded systems like robots, drones and small-form-factor devices. As a result, it is an enabling technology useful in the development of new applications or systems. There is direct usage, for example, to enabling healthy/independent living as it would allow a device such as 'Alexa' to be a smarter and better companion to the elderly; or within safety, it can allow a cheap battery-powered workplace or home camera to better understand the semantics of what it captures and react calling the police if it observes dangerous activities. Significantly, the proposed research will likely amplify and extend existing or already on-going machine learning research - this is because it allows a machine learning model that perhaps today must reside in powerful cloud computers to run directly within a drones, robots or devices; this in turn opens up an extend range of new possible application scenarios and use cases for such models.
At a societal level MOA contributes to more ethical, safer and privacy preserving forms of machine learning. Because it enables the use of machine learning directly on limited platforms like phones and devices it reduces the need to transmit and process sensitive data on 3rd party cloud servers not controlled or owned by consumers. Through MOA outcomes, consumers will be able to demand devices (like next-generation medical instruments) that retain their data, yet still offer the benefits of the latest in machine learning.
From the perspective of knowledge, MOA seeks to develop truly innovative concepts in models that can best utilize the latest in commodity and accelerator processor architectures (see workpackage 2). MOA also aims to mature and further invest in studying the latest techniques in a brand-new academic area (efficient deep learning) within a commercial level of quality and rigour (see workpackage 1) that can transfer to offering commercial benefits to companies like Nokia and Samsung that have multiple UK based teams.

Funded Value:

£608,250

Funded Period:

Jun 18 - May 20

Funder:

ISCF

Project Status:

Closed

Project Category:

Fellowship

Project Reference:

EP/S001530/1

Principal Investigator:

Nicholas Lane

Research Subject:

Info. & commun. Technol. (100%)

Research Topic:

Artificial Intelligence (50%)

Mobile Computing (50%)

Organisations

People	ORCID iD
Nicholas Lane (Principal Investigator / Fellow)	http://orcid.org/0000-0002-2728-8273

Publications

Author Name

Title Publication Date Published

|< < 1 2 3 4 > >|

10 25 50

Alizadeh M (2019) An Empirical study of Binary Neural Networks' Optimisation

Alizadeh M. (2022) PROSPECT PRUNING: FINDING TRAINABLE WEIGHTS AT INITIALIZATION USING META-GRADIENTS in ICLR 2022 - 10th International Conference on Learning Representations

Brown E (2021) Attention-based machine vision models and techniques for solar wind speed forecasting using solar EUV images

Brown E (2022) Attention-Based Machine Vision Models and Techniques for Solar Wind Speed Forecasting Using Solar EUV Images

Brown E (2022) Attention-Based Machine Vision Models and Techniques for Solar Wind Speed Forecasting Using Solar EUV Images in Space Weather

Chan S (2024) CAPTURE-24: A large dataset of wrist-worn activity tracker data collected in the wild for human activity recognition. in Scientific data

Dudziak (2019) ShrinkML: End-to-End ASR Model Compression Using Reinforcement Learning in arXiv e-prints

Kothari V (2020) The Final Frontier: Deep Learning in Space

Li K (2022) Secure Aggregation for Federated Learning in Flower

Li K (2021) Secure aggregation for federated learning in flower

Related Projects

Project Reference	Relationship	Related To	Start	End	Award Value
EP/S001530/1			28/06/2018	03/05/2020	£608,250
EP/S001530/2	Transfer	EP/S001530/1	04/05/2020	03/06/2022	£369,604

Key Findings
Impact Summary
Research Tools and Methods
Engagement Activities


Description	We discovered a method to squeeze down in size complicated ML used to recognize speech, so that the model can run directly on a phone. This allows speech to be recognized even when cellular coverage is poor. And recorded speech never needs to leave the phone to support the application. We have also extended this method to include image based ML.
Exploitation Route	They are currently being used within industry for building new products.
Sectors	Digital/Communication/Information Technologies (including Software)


Description	Our methods have been adopted in industy for use in products used by real consumers.
First Year Of Impact	2021
Sector	Digital/Communication/Information Technologies (including Software)
Impact Types	Economic


Title	CortexML Prototype Libraries and Framework
Description	As an outcome of research funded by MOA, is a software framework we called CortexML that facilitate the training and testing of image/vision based deep neural networks designed for micro-controllers (such as ARM M-series and A-series processors). It is built primarily through the implementation of a variety of quantization routines, along with memory and flash management. We are using this tool to perform a large systematic study of deep learning under micro-controller technology. Tools of this precise type are currently not available to the research community.
Type Of Material	Technology assay or reagent
Year Produced	2018
Provided To Others?	No
Impact	No significant impact yet. However, the tool is only very recent and used currently within my group. In the near term we will open source this code.


Title	TFLite Tools
Description	We released a tool that analyzes memory of machine learning models. This is useful for optimizing them for constrained devices.
Type Of Material	Technology assay or reagent
Year Produced	2020
Provided To Others?	Yes
Impact	People have used this tool to lower the memory footprint of their models. It is used by some researchers in the community.
URL	https://github.com/eliberis/tflite-tools


Title	torchquant
Description	Quantization is a popular technique for accelerating and compressing neural networks by utilizing low-bit arithmetic to represent weights and activations. It remains a hot area for research, with continued work on removing the gap in accuracy between full and low precision models. We observe that researchers in this area tend to rely on custom implementations, rather than approaches built into the popular machine learning libraries, as they are not sufficiently flexible to enable research. We are open sourcing TorchQuant, our MIT licensed library that builds upon PyTorch by providing researchers with modular components and implementations that will accelerate their research, and provide the community with consistent baselines. Using our library, we provide an example of how to quickly evaluate a research hypothesis: the "range-precision" trade-off for quantization-aware training. our library can be found at this URL: https://github. com/camlsys/torchquant.
Type Of Material	Improvements to research infrastructure
Year Produced	2022
Provided To Others?	Yes
Impact	A variety of researchers have used this tool in their research.
URL	https://github.com/camlsys/torchquant


Description	2nd International Workshop on Embedded and Mobile Deep Learning
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Other audiences
Results and Impact	In 2018 we ran an international conference that discussed our research aims (and existing solutions) for the MOA project with a variety of international researchers and industry practitioners. At this workshop we also invited them to present their results to share information between parties and build a critical mass of research activity within MOA related topics. Finally, we also invited key international speakers to present a series of three keynotes. This even will be repeated in 2019. It has been instrumental in building activity in this exciting new area.
Year(s) Of Engagement Activity	2018,2019
URL	https://www.sigmobile.org/mobisys/2018/workshops/deepmobile18/index.html


Description	ACM HotMobile 2020
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Schools
Results and Impact	Presented on-going work related to the MOA project.
Year(s) Of Engagement Activity	2020
URL	http://www.hotmobile.org/2020/


Description	ACM HotMobile 2021
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	This is a workshop on mobile computing. I attended, and during informal discussions spoke about work conducted under the MOA fellowship.
Year(s) Of Engagement Activity	2021
URL	http://www.hotmobile.org/2021/


Description	ACM MobiCom 2020
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	I gave a keynote talk at MobiCom 2020. This is an international conference and my talk covered aspects of MOA funded research (along with other activities)
Year(s) Of Engagement Activity	2020
URL	https://sigmobile.org/mobicom/2020/


Description	Discussions with Google Brain at Google
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Industry/Business
Results and Impact	My group talks to key people at Google Brain responsible for enabling TensorFlow for training deep models to function on embedded and mobile devices. We co-ordinate with them to understand upcoming methods being integrated into the software and most importantly open problems they are activity working on. We inform them of our research results and it has altered their perception as to techniques to adopt and integrate into TensorFlow. The key contact is Pete Warden, and members of his team.
Year(s) Of Engagement Activity	2018,2019


Description	Discussions with ML Research at ARM
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Industry/Business
Results and Impact	Discussion of latest trends and results with ML team of ARM (specifically Paul Whatmough based in Boston -- and his team). Primarily this is quarterly. It includes in-person visits and online conference calls.
Year(s) Of Engagement Activity	2018,2019


Description	Huawei Collaboration Workshop
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Industry/Business
Results and Impact	Gave talk about MOA funded research to a workshop run by Huawei for company employees and a wide variety of other academics also in attendence.
Year(s) Of Engagement Activity	2022


Description	MLSys 2020
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Schools
Results and Impact	We presented our latest results for MOA to the MLSys audience
Year(s) Of Engagement Activity	2020
URL	https://mlsys.org/


Description	Ubicomp 2021
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Other audiences
Results and Impact	Attended top-tier academic conference. Presented and discussed work performed under the MOA grant.
Year(s) Of Engagement Activity	2021