MultiTasking and Continual Learning for Audio Sensing Tasks on Resource-Constrained Platforms

Lead Research Organisation: University of Southampton
Department Name: Sch of Electronics and Computer Sci

Abstract

Deep learning (a form of machine learning) has been highly successful in solving many complex tasks in the domain of computer vision, natural language processing and speech processing due to its ability to learn effective representations from raw data. This success is pushing demands to deploy deep learning models on more resource-constrained computing platforms such as Micro Controller Units (MCUs) as they are highly accurate and deploying them locally on the device will give enhanced privacy to user data. Proliferation of cheap sensors and IoT (Internet of Things) devices will further fuel this demand and the recent trends in ever growing field of TinyML (putting machine/deep learning on tiny devices) is an indication thereof.
Recently, significant gains have been made in realizing the goal of deploying deep learning models efficiently on resource-constrained devices. However, the focus is still limited to solving single tasks efficiently. Additionally, the models are static; they cannot learn with time. We think it is time to go beyond the static "once learnt and deploy" deep learning models. The ability to multi-task and learn continuously is required to adapt to unseen changes, learn new information, and handle multiple disparate applications. However, accommodating these abilities on resource-constrained devices is extremely challenging: limited memory and compute power.

To this end, this project aims to develop a range of techniques that make deep learning models learn on the fly and solve multiple tasks efficiently (low latency, low power) on resource-constrained devices. Overall, the project goals are to: (a) design an optimal memory management scheme to keep multiple deep learning models in the available memory of the device, (b) devise novel scheduling strategies that can distribute the workload on available processing cores on device efficiently for an application and execute tasks in parallel, and finally (c) come up with a method that will combine continual learning with few shot learning paradigm to allow models to learn continuously with few annotated data on device. The developed techniques will be tested on an embedded audio platform for a variety of audio sensing tasks such as keyword spotting, audio scene analysis (localization, scene classification, sound classification), and speech enhancement. The focus on audio is due to its rising prominence in many core applications, such as home hubs like Alexa, ecological monitoring, disease diagnostics, preventive maintenance, hearables and accessibility devices such as hearing aids.

The resulting innovations from this work will have numerous benefits. First of all, developing efficient deep learning solutions that can run on-device would save power leading to a lower digital carbon footprint, a far more sustainable society and contributing to the UK's mission of NetZero 2050. Local execution further leads to enhanced user privacy as data never leaves the device. This is also a key benefit at places where network communication is absent or can be expensive such as LMIC (Low- and middle-income countries). Secondly, creating such solutions means users can enjoy the advantages provided by deep learning (often high accuracy, high performance) for many useful and ubiquitous computing applications in their day to day lives. Finally, by doing continual learning on the device we will move one inch closer to machines that can reflect true human intelligence.

Publications

10 25 50