Unifying audio signal processing and machine learning: a fundamental framework for machine hearing

Lead Research Organisation: University of Cambridge

Department Name: Engineering

Abstract

Modern technology is leading to a flood of audio data. For example, over seventy two hours of unstructured and unlabelled sound-tracks are uploaded to internet sites every minute. Automatic systems are urgently needed for recognising audio content so that these sound-tracks can be tagged for categorisation and search. Moreover, an increasing proportion of recordings are made on hand-held devices in challenging environments that contain multiple sound sources and noise. Such uncurated and noisy data necessitate automatic systems for cleaning the audio content and separating sources from mixtures. On a related note, devices for the hearing impaired currently perform poorly in noise. In fact, this is a major reason why six million people in the UK who would benefit from a hearing aid, do not use them (a market worth £18 billion p.a.). Patients fitted with cochlear implants suffer from similar limitations, and as the population ages more people are affected.

It is clear that audio recognition and enhancement methods are required to stop us drowning in audio-data, for processing in hearing devices, and to
support new technological innovations. Current approaches to these problems use a combination of audio signal processing (which places the audio data into a convenient format and reduces the data-rate) and machine learning (which removes noise, separates sources, or classifies the content). It is widely believed that these two fields must become increasingly integrated in the future. However, this union is currently a troubled one, suffering from four problems.

Inefficiency: The methods are too inefficient when we have vast amounts of data (as is the case for audio-tracks on the web) or for real-time applications (such as is necessary in hearing aids)
Impoverished models: The machine learning modules tend to be statistically limited.
Unadapted: The signal processing modules are unadapted despite evidence from other fields, like computer vision, which suggests that automatic tuning leads to significant performance gains
Distorted mixtures: The signal processing modules introduce non-linear distortions which are not captured by the machine learning modules.

In this project we address these four limitations by introducing a new theoretical framework which unifies signal processing and machine learning. The key step is to view the signal processing module as solving an inference problem. Since the machine-learning modules are often framed in this way, the two modules can be integrated into a single coherent approach allowing technologies from the two fields to be completely integrated. In the project we will then use the new approach to develop efficient, rich, adaptive, and distortion free approaches to audio denoising, source separation and recognition. We will evaluate the the noise reduction and source separations algorithms on the hearing impaired, and the audio recognition algorithms on audio-sound track data.

We believe this new framework will form a foundation of the emerging field of machine hearing. In the future, machine hearing will be deployed in a vast range of applications from music processing tasks to augmented reality systems (in conjunction with technologies from computer vision). We believe that this project will kick start this proliferation.

Planned Impact

The signal processing and machine learning methods developed in this project are keystone technologies upon which upstream research depends. In particular, the project will have a significant impact for the health-care and digital industries. The following specific groups will benefit from the research:

The hearing impaired: hearing aid users

Six million people in the UK who would benefit from a hearing aid do not use them (a market worth 18 billion p.a.). This group of people is expanding rapidly as the population ages (the number of people aged 65 or older is expected to double by 2050). One of the main reasons why hearing aids are not as widely used as they should be is that they perform poorly in noisy environments. The efficient and adaptive noise removal systems developed for hearing aids in this project will address this key issue. This proposal will therefore contribute to the EPSRC Healthcare Technologies theme. Prof. Moore will be involved in translating the research into hearing aids, including providing access to hearing impaired patients for testing.

The hearing impaired: cochlear implantees

Cochlear implants allow the profoundly deaf, who get little or no benefit from a normal hearing aid, to gain awareness of environmental sounds and, in most cases, understand speech without lip-reading. 8000 people in the UK currently use cochlear implants and there are 1000 new implantees each year. Again, perhaps the major limitation of the current devices is their poor performance in noisy environments. The efficient and adaptive noise removal systems developed in this project aim to address this key issue thereby contributing to the EPSRC Healthcare Technologies theme. Dr. Carlyon will be involved in translating the research into cochlear implants, including providing access to implanted patients for testing.

Audio search and information retrieval: digital-industries and society

Many companies preside over large, uncurated collections of audio data and they would benefit from methods for searching and categorizing the data. For example, over seventy two hours of unstructured and unlabelled sound-tracks are uploaded to internet sites every minute and this number continues to grow. Automatic systems are urgently needed for recognising audio content so that these sound-tracks can be tagged (possibly at precise times throughout the clips) for categorisation and search. Often the audio-tracks are recorded with video and so the audio tags can also be used to search the video. The audio recognition technology developed in this project therefore has wide commercial application to information retrieval. Similarly, the BBC's public space project is attempting to organise and unlock the archives of the BBC, making them accessible to the public. The audio-recognition technologies will make significant contributions to this project, thereby improving public services and enhancing life.

Audio denoising and source separation: digital-industries and society

Poor quality audio data is becoming common place. For example, 3hrs of recordings made on hand-held devices are uploaded to YouTube every minute. These recording are often made in challenging environments that contain multiple sound sources and noise. Such noisy data necessitate automatic systems for cleaning the audio content and separating sources from mixtures. This project will provide tools for this purpose. Moreover, upstream technologies such as Automatic Speech Recognition (ASR) and Audio Diarisation (AD) systems perform poorly in noisy environments. As such the noise removal methods developed in the project can be coupled with these approaches to improve performance. Since modern approaches to ASR and AD are probabilistic, this raises the possibility of integrated approaches that jointly estimate the noise at the same time as interpreting the content. The advisory group, and in particular Prof. Gales, will advise on possible technology transfer

Funded Value:

£97,101

Funded Period:

Nov 13 - Nov 15

Funder:

EPSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

EP/L000776/1

Principal Investigator:

Richard Turner

Research Subject:

Info. & commun. Technol. (100%)

Research Topic:

Artificial Intelligence (30%)

Digital Signal Processing (30%)

Music & Acoustic Technology (20%)

Vision & Senses - ICT appl. (20%)

Organisations

People	ORCID iD
Richard Turner (Principal Investigator)

Publications

Author Name

Title Publication Date Published

|< < 1 2 3 > >|

10 25 50

Adel T. (2020) CONTINUAL LEARNING WITH ADAPTIVE WEIGHTS (CLAW) in 8th International Conference on Learning Representations, ICLR 2020

Alexander A (2016) On sparse variational methods and the Kullback-Leibler divergence between stochastic processes

Alexander G. De G. Matthews (2018) Gaussian Process Behaviour in Wide Deep Neural Networks

Alexandre K. W. Navarro (2017) The Multivariate Generalised von Mises Distribution: Inference and Applications

Anqi Wu (2019) Deterministic Variational Inference for Robust Bayesian Neural Networks

Arno Solin (2018) Infinite-Horizon Gaussian Processes

Ashman M (2020) Sparse Gaussian Process Variational Autoencoders

Bauer M (2017) Discriminative k-shot learning using probabilistic models

Bronskill J. (2020) Tasknorm: Rethinking batch normalization for meta-learning in 37th International Conference on Machine Learning, ICML 2020

Bui T (2014) Tree-structured Gaussian Process Approximations

Bui T (2017) Streaming Sparse Gaussian Process Approximations

Bui T.D. (2016) Deep Gaussian processes for regression using approximate expectation propagation in 33rd International Conference on Machine Learning, ICML 2016

Bui Thang D. (2017) A Unifying Framework for Gaussian Process Pseudo-Point Approximations using Power Expectation Propagation in JOURNAL OF MACHINE LEARNING RESEARCH

Bui, T.D. (2017) Streaming Sparse Gaussian Process Approximations

Choromanski K (2018) Structured Evolution with Compact Architectures for Scalable Policy Optimization

De G. Matthews A.G. (2016) On sparse variational methods and the Kullback-Leibler divergence between stochastic processes in Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, AISTATS 2016

Foong A.Y.K. (2020) Meta-learning stationary stochastic process prediction with convolutional neural processes in Advances in Neural Information Processing Systems

Foong A.Y.K. (2020) On the expressiveness of approximate inference in Bayesian neural networks in Advances in Neural Information Processing Systems

Gal Y (2015) Improving the Gaussian Process Sparse Spectrum Approximation by Representing Uncertainty in Frequency Inputs

Gomersall PA (2016) Perception of stochastic envelopes by normal-hearing and cochlear-implant listeners. in Hearing research

Gordon J (2018) Meta-Learning Probabilistic Inference For Prediction

Gordon J. (2020) CONVOLUTIONAL CONDITIONAL NEURAL PROCESSES in 8th International Conference on Learning Representations, ICLR 2020

Gordon J. (2019) Meta-learning probabilistic inference for prediction in 7th International Conference on Learning Representations, ICLR 2019

Gu S (2015) Neural adaptive sequential Monte Carlo

Gu S (2015) Neural Adaptive Sequential Monte Carlo

Key Findings
Impact Summary
Further Funding
Collaboration
Engagement Activities


Description	We have made the following contributions: 1. we have shown a theoretical connection between probabilistic machine learning approaches and classical signal processing processing methods which bring these fields closer together and makes it simpler and more efficient to combine methods from both. This has led to better methods for removing noise from audio and filling in missing data. This contribution has been published in IEEE Transactions in Signal Processing. 2. we have scaled up a fundamental machine learning tool -- called a Gaussian process -- so that it can handle large scale datasets containing millions of datapoints. Previously these methods were limited to handling ten thousand datapoints. These contributions have been published in the Neural Information Processing Systems Conference, The Journal of Machine Learning Research, and the International Conference on Machine Learning. 3. We have developed new generally applicable machine learning tools.
Exploitation Route	We are pursuing the following opportunities: -- applying the technology to improving audio uploaded to the internet and automatically recognising content (together with Google) -- applying methods to improving hearing devices (including cochlear implants and hearing aids)
Sectors	Digital/Communication/Information Technologies (including Software) Healthcare
URL	http://cbl.eng.cam.ac.uk/Public/Turner/ResearchAreas


Description	The grant lead to technology being developed that formed the basis for a successful EPSRC Research Grant application. 1. efficient methods for removing noise from audio and recognising content for tagging video sound tracks (in collaboration with Prof. Brian Moore) 2. improving hearing devices for the hearing impaired (together with Dr. Robert Carlyon, Prof. Brian Moore and Dr. David Baguley) In addition it also led to a strong collaboration with Google which has resulted in follow up funding (worth roughly £120k). The fundamental machine learning methods have led to strong engagement with Microsoft Research and Toyota.
Sector	Digital/Communication/Information Technologies (including Software),Healthcare
Impact Types	Societal


Description	Amazon research award
Amount	$80,000 (USD)
Organisation	Amazon.com
Sector	Private
Country	United States
Start	03/2019


Description	Baroness de Turckhiem Fund Award
Amount	£22,233 (GBP)
Organisation	University of Cambridge
Department	Trinity College Cambridge
Sector	Academic/University
Country	United Kingdom
Start	01/2017
End	01/2019


Description	Google European Doctoral Programme
Amount	£40,000 (GBP)
Organisation	Google
Sector	Private
Country	United States
Start	08/2015
End	09/2017


Description	Google Focussed Research Award
Amount	£101,455 (GBP)
Organisation	Google
Sector	Private
Country	United States
Start	04/2017
End	09/2021


Description	Machine Learning for Tomorrow: Efficient, Flexible, Robust and Automated
Amount	£3,100,000 (GBP)
Funding ID	EP/T005637/1
Organisation	Engineering and Physical Sciences Research Council (EPSRC)
Sector	Public
Country	United Kingdom
Start	01/2020
End	01/2025


Description	Microsoft Research Azure Computer Credits Award
Amount	$50,000 (USD)
Organisation	Microsoft Research
Sector	Private
Country	Global
Start	02/2018
End	02/2019


Description	Collaboration with Prof. Brian Moore
Organisation	University of Cambridge
Department	Department of Psychology
Country	United Kingdom
Sector	Academic/University
PI Contribution	Based on the research in grant EP/L000776/1, Prof. Moore approached us to apply for an EPSRC research grant which we were subsequently awarded (grant number EP/G050821/1).
Collaborator Contribution	Together we devised the new research programme (based around intelligent hearing devices and intelligent hearing tests). Prof. Brian Moore brought expertise in hearing and hearing devices and industrial collaborators (Phonak). I provided expertise in audio time series modelling and machine learning.
Impact	Awarded EPSRC research grant (grant number EP/G050821/1) Multi-disciplinary: machine learning, signal processing, time-series modelling, hearing, hearing aids, deafness research
Start Year	2015


Description	Microsoft Research
Organisation	Microsoft Research
Country	Global
Sector	Private
PI Contribution	Joint supervision of PhD students and Postdoctoral Research Associates; Joint projects with applications within Microsoft and beyond.
Collaborator Contribution	Joint supervision of PhD students and Postdoctoral Research Associates; Joint projects with applications within Microsoft and beyond.
Impact	The partnership has existed for 5 months. In that time we have had three joint papers written and accepted together. In the longer term we expect the project to have economic and societal impact in the fields of health, gaming, and intelligent software systems. We have applied for a Prosperity Partnership grant together in order to expand and strengthen the existing partnership.
Start Year	2018


Description	Appeared on BBC Radio 5 Live
Form Of Engagement Activity	A broadcast e.g. TV/radio/film/podcast (other than news/press)
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Public/other audiences
Results and Impact	Appeared on BBC 5Live programme the Naked Scientist talking about hearing and my research. The show was conducted in front of a live audience of school children.
Year(s) Of Engagement Activity	2014
URL	http://www.thenakedscientists.com/HTML/interviews/interview/1001043/

Abstract

Planned Impact

Organisations

People

ORCID iD

Publications