MOA: High Efficiency Deep Learning for Embedded and Mobile Platforms (Full EPSRC Fellowship Submission)
Lead Research Organisation:
University of Oxford
Department Name: Computer Science
Abstract
In just a few short years, breakthroughs from the field of deep learning have transformed how computers perform a wide-variety of tasks such as recognizing a face, tracking emotions or monitoring physical activities. Unfortunately, the models and algorithms used by deep learning typically exert severe energy, memory and compute demands on local device resources and this conventionally limits their adoption within mobile and embedded devices. Data perception and understanding tasks powered by deep learning are so fundamental to platforms like phones, wearables and home/industrial sensors, that we must reach a point where current -- and future -- innovations in this area can be simply and efficiently integrated within even such resource constrained systems. This research vector will lead directly to outcomes like: brand new types of sensor-based products in the home/workplace, as well as enabling increasing the intelligence within not only consumer devices, but also in fields like medicine (smart stethoscopes) and anonymous systems (robotics/drones).
The MOA fellowship aims to fund basic research, development and eventual commercialization (through collaborations with a series of industry partners) algorithms that aims to enable general support for deep learning techniques on resource-constrained mobile and embedded devices. Primarily, this requires a radical reduction in the resources (viz. energy, memory and computation) consumed by these computational models -- especially at inference (i.e., execution) time. The proposal seeks will have two main thrusts. First, build upon the existing work of the PI in this area towards achieving this goal which includes: sparse intra-model layer representations (resulting in small models), dynamic forms of compression (models that can be squeezed smaller or bigger as needed), and scheduling partitioned model architectures (splitting models and running parts of them on the processor that suits that model fraction best on certain processors found inside a mobile/embedded device). This thrust will re-examine these methods towards solving key remaining issues that would prevent such techniques from being used within products and as part of common practices. Second, investigate a new set of ambitious directions that seek to increase the utilization of emerging purpose-built small-form-factor hardware processor accelerators designed for deep learning algorithms (these accelerators are suitable for use within phones, wearables and drones). However, like any piece of hardware, it is still limited by how it is programmed - and software toolchains that map deep learning models to the accelerator hardware remain infancy. Our preliminary results show that existing approaches to optimizing deep models, conceived first for conventional processors (e.g., DSPs, GPUs, CPUs), poorly use the new capabilities of these hardware accelerators. We will examine the development of important new approaches that modify the representation and inference algorithms used within deep learning so that they can fully utilize the new hardware capabilities. Directions include: mixed precision models and algorithms, low-data movement representations (that can trade memory operations for compute), and enhanced parallelization.
The MOA fellowship aims to fund basic research, development and eventual commercialization (through collaborations with a series of industry partners) algorithms that aims to enable general support for deep learning techniques on resource-constrained mobile and embedded devices. Primarily, this requires a radical reduction in the resources (viz. energy, memory and computation) consumed by these computational models -- especially at inference (i.e., execution) time. The proposal seeks will have two main thrusts. First, build upon the existing work of the PI in this area towards achieving this goal which includes: sparse intra-model layer representations (resulting in small models), dynamic forms of compression (models that can be squeezed smaller or bigger as needed), and scheduling partitioned model architectures (splitting models and running parts of them on the processor that suits that model fraction best on certain processors found inside a mobile/embedded device). This thrust will re-examine these methods towards solving key remaining issues that would prevent such techniques from being used within products and as part of common practices. Second, investigate a new set of ambitious directions that seek to increase the utilization of emerging purpose-built small-form-factor hardware processor accelerators designed for deep learning algorithms (these accelerators are suitable for use within phones, wearables and drones). However, like any piece of hardware, it is still limited by how it is programmed - and software toolchains that map deep learning models to the accelerator hardware remain infancy. Our preliminary results show that existing approaches to optimizing deep models, conceived first for conventional processors (e.g., DSPs, GPUs, CPUs), poorly use the new capabilities of these hardware accelerators. We will examine the development of important new approaches that modify the representation and inference algorithms used within deep learning so that they can fully utilize the new hardware capabilities. Directions include: mixed precision models and algorithms, low-data movement representations (that can trade memory operations for compute), and enhanced parallelization.
Planned Impact
Expanded discussion upon aspects of how MOA relates to national importance are contained within the Pathways to Impact document. But here summarize in the following paragraphs, the core aspects of importance related to specific EPSRC priority areas, the broader economy, societal impact and contributions to knowledge.
MOA fits within the 'Robotics and artificial intelligence systems' priority area of the UKRI call. It aims to produce technology for enabling machine learning models that otherwise need to run remotely in the cloud, to instead run directly within mobile and embedded systems like robots, drones and small-form-factor devices. As a result, it is an enabling technology useful in the development of new applications or systems. There is direct usage, for example, to enabling healthy/independent living as it would allow a device such as 'Alexa' to be a smarter and better companion to the elderly; or within safety, it can allow a cheap battery-powered workplace or home camera to better understand the semantics of what it captures and react calling the police if it observes dangerous activities. Significantly, the proposed research will likely amplify and extend existing or already on-going machine learning research - this is because it allows a machine learning model that perhaps today must reside in powerful cloud computers to run directly within a drones, robots or devices; this in turn opens up an extend range of new possible application scenarios and use cases for such models.
At a societal level MOA contributes to more ethical, safer and privacy preserving forms of machine learning. Because it enables the use of machine learning directly on limited platforms like phones and devices it reduces the need to transmit and process sensitive data on 3rd party cloud servers not controlled or owned by consumers. Through MOA outcomes, consumers will be able to demand devices (like next-generation medical instruments) that retain their data, yet still offer the benefits of the latest in machine learning.
From the perspective of knowledge, MOA seeks to develop truly innovative concepts in models that can best utilize the latest in commodity and accelerator processor architectures (see workpackage 2). MOA also aims to mature and further invest in studying the latest techniques in a brand-new academic area (efficient deep learning) within a commercial level of quality and rigour (see workpackage 1) that can transfer to offering commercial benefits to companies like Nokia and Samsung that have multiple UK based teams.
MOA fits within the 'Robotics and artificial intelligence systems' priority area of the UKRI call. It aims to produce technology for enabling machine learning models that otherwise need to run remotely in the cloud, to instead run directly within mobile and embedded systems like robots, drones and small-form-factor devices. As a result, it is an enabling technology useful in the development of new applications or systems. There is direct usage, for example, to enabling healthy/independent living as it would allow a device such as 'Alexa' to be a smarter and better companion to the elderly; or within safety, it can allow a cheap battery-powered workplace or home camera to better understand the semantics of what it captures and react calling the police if it observes dangerous activities. Significantly, the proposed research will likely amplify and extend existing or already on-going machine learning research - this is because it allows a machine learning model that perhaps today must reside in powerful cloud computers to run directly within a drones, robots or devices; this in turn opens up an extend range of new possible application scenarios and use cases for such models.
At a societal level MOA contributes to more ethical, safer and privacy preserving forms of machine learning. Because it enables the use of machine learning directly on limited platforms like phones and devices it reduces the need to transmit and process sensitive data on 3rd party cloud servers not controlled or owned by consumers. Through MOA outcomes, consumers will be able to demand devices (like next-generation medical instruments) that retain their data, yet still offer the benefits of the latest in machine learning.
From the perspective of knowledge, MOA seeks to develop truly innovative concepts in models that can best utilize the latest in commodity and accelerator processor architectures (see workpackage 2). MOA also aims to mature and further invest in studying the latest techniques in a brand-new academic area (efficient deep learning) within a commercial level of quality and rigour (see workpackage 1) that can transfer to offering commercial benefits to companies like Nokia and Samsung that have multiple UK based teams.
Publications
Alizadeh M
(2019)
An Empirical study of Binary Neural Networks' Optimisation
Alizadeh M.
(2022)
PROSPECT PRUNING: FINDING TRAINABLE WEIGHTS AT INITIALIZATION USING META-GRADIENTS
in ICLR 2022 - 10th International Conference on Learning Representations
Brown E
(2022)
Attention-Based Machine Vision Models and Techniques for Solar Wind Speed Forecasting Using Solar EUV Images
in Space Weather
Chan S
(2024)
CAPTURE-24: A large dataset of wrist-worn activity tracker data collected in the wild for human activity recognition.
in Scientific data
Dudziak
(2019)
ShrinkML: End-to-End ASR Model Compression Using Reinforcement Learning
in arXiv e-prints
Kothari V
(2020)
The Final Frontier: Deep Learning in Space
Related Projects
| Project Reference | Relationship | Related To | Start | End | Award Value |
|---|---|---|---|---|---|
| EP/S001530/1 | 28/06/2018 | 03/05/2020 | £608,250 | ||
| EP/S001530/2 | Transfer | EP/S001530/1 | 04/05/2020 | 03/06/2022 | £369,604 |
| Description | We discovered a method to squeeze down in size complicated ML used to recognize speech, so that the model can run directly on a phone. This allows speech to be recognized even when cellular coverage is poor. And recorded speech never needs to leave the phone to support the application. We have also extended this method to include image based ML. |
| Exploitation Route | They are currently being used within industry for building new products. |
| Sectors | Digital/Communication/Information Technologies (including Software) |
| Description | Our methods have been adopted in industy for use in products used by real consumers. |
| First Year Of Impact | 2021 |
| Sector | Digital/Communication/Information Technologies (including Software) |
| Impact Types | Economic |
| Title | CortexML Prototype Libraries and Framework |
| Description | As an outcome of research funded by MOA, is a software framework we called CortexML that facilitate the training and testing of image/vision based deep neural networks designed for micro-controllers (such as ARM M-series and A-series processors). It is built primarily through the implementation of a variety of quantization routines, along with memory and flash management. We are using this tool to perform a large systematic study of deep learning under micro-controller technology. Tools of this precise type are currently not available to the research community. |
| Type Of Material | Technology assay or reagent |
| Year Produced | 2018 |
| Provided To Others? | No |
| Impact | No significant impact yet. However, the tool is only very recent and used currently within my group. In the near term we will open source this code. |
| Title | TFLite Tools |
| Description | We released a tool that analyzes memory of machine learning models. This is useful for optimizing them for constrained devices. |
| Type Of Material | Technology assay or reagent |
| Year Produced | 2020 |
| Provided To Others? | Yes |
| Impact | People have used this tool to lower the memory footprint of their models. It is used by some researchers in the community. |
| URL | https://github.com/eliberis/tflite-tools |
| Title | torchquant |
| Description | Quantization is a popular technique for accelerating and compressing neural networks by utilizing low-bit arithmetic to represent weights and activations. It remains a hot area for research, with continued work on removing the gap in accuracy between full and low precision models. We observe that researchers in this area tend to rely on custom implementations, rather than approaches built into the popular machine learning libraries, as they are not sufficiently flexible to enable research. We are open sourcing TorchQuant, our MIT licensed library that builds upon PyTorch by providing researchers with modular components and implementations that will accelerate their research, and provide the community with consistent baselines. Using our library, we provide an example of how to quickly evaluate a research hypothesis: the "range-precision" trade-off for quantization-aware training. our library can be found at this URL: https://github. com/camlsys/torchquant. |
| Type Of Material | Improvements to research infrastructure |
| Year Produced | 2022 |
| Provided To Others? | Yes |
| Impact | A variety of researchers have used this tool in their research. |
| URL | https://github.com/camlsys/torchquant |
| Description | 2nd International Workshop on Embedded and Mobile Deep Learning |
| Form Of Engagement Activity | Participation in an activity, workshop or similar |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Other audiences |
| Results and Impact | In 2018 we ran an international conference that discussed our research aims (and existing solutions) for the MOA project with a variety of international researchers and industry practitioners. At this workshop we also invited them to present their results to share information between parties and build a critical mass of research activity within MOA related topics. Finally, we also invited key international speakers to present a series of three keynotes. This even will be repeated in 2019. It has been instrumental in building activity in this exciting new area. |
| Year(s) Of Engagement Activity | 2018,2019 |
| URL | https://www.sigmobile.org/mobisys/2018/workshops/deepmobile18/index.html |
| Description | ACM HotMobile 2020 |
| Form Of Engagement Activity | Participation in an activity, workshop or similar |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Schools |
| Results and Impact | Presented on-going work related to the MOA project. |
| Year(s) Of Engagement Activity | 2020 |
| URL | http://www.hotmobile.org/2020/ |
| Description | ACM HotMobile 2021 |
| Form Of Engagement Activity | Participation in an activity, workshop or similar |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Postgraduate students |
| Results and Impact | This is a workshop on mobile computing. I attended, and during informal discussions spoke about work conducted under the MOA fellowship. |
| Year(s) Of Engagement Activity | 2021 |
| URL | http://www.hotmobile.org/2021/ |
| Description | ACM MobiCom 2020 |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Postgraduate students |
| Results and Impact | I gave a keynote talk at MobiCom 2020. This is an international conference and my talk covered aspects of MOA funded research (along with other activities) |
| Year(s) Of Engagement Activity | 2020 |
| URL | https://sigmobile.org/mobicom/2020/ |
| Description | Discussions with Google Brain at Google |
| Form Of Engagement Activity | Participation in an activity, workshop or similar |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Industry/Business |
| Results and Impact | My group talks to key people at Google Brain responsible for enabling TensorFlow for training deep models to function on embedded and mobile devices. We co-ordinate with them to understand upcoming methods being integrated into the software and most importantly open problems they are activity working on. We inform them of our research results and it has altered their perception as to techniques to adopt and integrate into TensorFlow. The key contact is Pete Warden, and members of his team. |
| Year(s) Of Engagement Activity | 2018,2019 |
| Description | Discussions with ML Research at ARM |
| Form Of Engagement Activity | Participation in an activity, workshop or similar |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Industry/Business |
| Results and Impact | Discussion of latest trends and results with ML team of ARM (specifically Paul Whatmough based in Boston -- and his team). Primarily this is quarterly. It includes in-person visits and online conference calls. |
| Year(s) Of Engagement Activity | 2018,2019 |
| Description | Huawei Collaboration Workshop |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Industry/Business |
| Results and Impact | Gave talk about MOA funded research to a workshop run by Huawei for company employees and a wide variety of other academics also in attendence. |
| Year(s) Of Engagement Activity | 2022 |
| Description | MLSys 2020 |
| Form Of Engagement Activity | Participation in an activity, workshop or similar |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Schools |
| Results and Impact | We presented our latest results for MOA to the MLSys audience |
| Year(s) Of Engagement Activity | 2020 |
| URL | https://mlsys.org/ |
| Description | Ubicomp 2021 |
| Form Of Engagement Activity | Participation in an activity, workshop or similar |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Other audiences |
| Results and Impact | Attended top-tier academic conference. Presented and discussed work performed under the MOA grant. |
| Year(s) Of Engagement Activity | 2021 |
