Unifying audio signal processing and machine learning: a fundamental framework for machine hearing

Lead Research Organisation: University of Cambridge
Department Name: Engineering

Abstract

Modern technology is leading to a flood of audio data. For example, over seventy two hours of unstructured and unlabelled sound-tracks are uploaded to internet sites every minute. Automatic systems are urgently needed for recognising audio content so that these sound-tracks can be tagged for categorisation and search. Moreover, an increasing proportion of recordings are made on hand-held devices in challenging environments that contain multiple sound sources and noise. Such uncurated and noisy data necessitate automatic systems for cleaning the audio content and separating sources from mixtures. On a related note, devices for the hearing impaired currently perform poorly in noise. In fact, this is a major reason why six million people in the UK who would benefit from a hearing aid, do not use them (a market worth £18 billion p.a.). Patients fitted with cochlear implants suffer from similar limitations, and as the population ages more people are affected.

It is clear that audio recognition and enhancement methods are required to stop us drowning in audio-data, for processing in hearing devices, and to
support new technological innovations. Current approaches to these problems use a combination of audio signal processing (which places the audio data into a convenient format and reduces the data-rate) and machine learning (which removes noise, separates sources, or classifies the content). It is widely believed that these two fields must become increasingly integrated in the future. However, this union is currently a troubled one, suffering from four problems.

Inefficiency: The methods are too inefficient when we have vast amounts of data (as is the case for audio-tracks on the web) or for real-time applications (such as is necessary in hearing aids)
Impoverished models: The machine learning modules tend to be statistically limited.
Unadapted: The signal processing modules are unadapted despite evidence from other fields, like computer vision, which suggests that automatic tuning leads to significant performance gains
Distorted mixtures: The signal processing modules introduce non-linear distortions which are not captured by the machine learning modules.

In this project we address these four limitations by introducing a new theoretical framework which unifies signal processing and machine learning. The key step is to view the signal processing module as solving an inference problem. Since the machine-learning modules are often framed in this way, the two modules can be integrated into a single coherent approach allowing technologies from the two fields to be completely integrated. In the project we will then use the new approach to develop efficient, rich, adaptive, and distortion free approaches to audio denoising, source separation and recognition. We will evaluate the the noise reduction and source separations algorithms on the hearing impaired, and the audio recognition algorithms on audio-sound track data.

We believe this new framework will form a foundation of the emerging field of machine hearing. In the future, machine hearing will be deployed in a vast range of applications from music processing tasks to augmented reality systems (in conjunction with technologies from computer vision). We believe that this project will kick start this proliferation.

Planned Impact

The signal processing and machine learning methods developed in this project are keystone technologies upon which upstream research depends. In particular, the project will have a significant impact for the health-care and digital industries. The following specific groups will benefit from the research:

The hearing impaired: hearing aid users

Six million people in the UK who would benefit from a hearing aid do not use them (a market worth 18 billion p.a.). This group of people is expanding rapidly as the population ages (the number of people aged 65 or older is expected to double by 2050). One of the main reasons why hearing aids are not as widely used as they should be is that they perform poorly in noisy environments. The efficient and adaptive noise removal systems developed for hearing aids in this project will address this key issue. This proposal will therefore contribute to the EPSRC Healthcare Technologies theme. Prof. Moore will be involved in translating the research into hearing aids, including providing access to hearing impaired patients for testing.

The hearing impaired: cochlear implantees

Cochlear implants allow the profoundly deaf, who get little or no benefit from a normal hearing aid, to gain awareness of environmental sounds and, in most cases, understand speech without lip-reading. 8000 people in the UK currently use cochlear implants and there are 1000 new implantees each year. Again, perhaps the major limitation of the current devices is their poor performance in noisy environments. The efficient and adaptive noise removal systems developed in this project aim to address this key issue thereby contributing to the EPSRC Healthcare Technologies theme. Dr. Carlyon will be involved in translating the research into cochlear implants, including providing access to implanted patients for testing.

Audio search and information retrieval: digital-industries and society

Many companies preside over large, uncurated collections of audio data and they would benefit from methods for searching and categorizing the data. For example, over seventy two hours of unstructured and unlabelled sound-tracks are uploaded to internet sites every minute and this number continues to grow. Automatic systems are urgently needed for recognising audio content so that these sound-tracks can be tagged (possibly at precise times throughout the clips) for categorisation and search. Often the audio-tracks are recorded with video and so the audio tags can also be used to search the video. The audio recognition technology developed in this project therefore has wide commercial application to information retrieval. Similarly, the BBC's public space project is attempting to organise and unlock the archives of the BBC, making them accessible to the public. The audio-recognition technologies will make significant contributions to this project, thereby improving public services and enhancing life.

Audio denoising and source separation: digital-industries and society

Poor quality audio data is becoming common place. For example, 3hrs of recordings made on hand-held devices are uploaded to YouTube every minute. These recording are often made in challenging environments that contain multiple sound sources and noise. Such noisy data necessitate automatic systems for cleaning the audio content and separating sources from mixtures. This project will provide tools for this purpose. Moreover, upstream technologies such as Automatic Speech Recognition (ASR) and Audio Diarisation (AD) systems perform poorly in noisy environments. As such the noise removal methods developed in the project can be coupled with these approaches to improve performance. Since modern approaches to ASR and AD are probabilistic, this raises the possibility of integrated approaches that jointly estimate the noise at the same time as interpreting the content. The advisory group, and in particular Prof. Gales, will advise on possible technology transfer

Publications

10 25 50
 
Description We have made the following contributions:
1. we have shown a theoretical connection between probabilistic machine learning approaches and classical signal processing processing methods which bring these fields closer together and makes it simpler and more efficient to combine methods from both. This has led to better methods for removing noise from audio and filling in missing data. This contribution has been published in IEEE Transactions in Signal Processing.

2. we have scaled up a fundamental machine learning tool -- called a Gaussian process -- so that it can handle large scale datasets containing millions of datapoints. Previously these methods were limited to handling ten thousand datapoints. These contributions have been published in the Neural Information Processing Systems Conference, The Journal of Machine Learning Research, and the International Conference on Machine Learning.

3. We have developed new generally applicable machine learning tools.
Exploitation Route We are pursuing the following opportunities:
-- applying the technology to improving audio uploaded to the internet and automatically recognising content (together with Google)
-- applying methods to improving hearing devices (including cochlear implants and hearing aids)
Sectors Digital/Communication/Information Technologies (including Software),Healthcare

URL http://cbl.eng.cam.ac.uk/Public/Turner/ResearchAreas
 
Description The grant lead to technology being developed that formed the basis for a successful EPSRC Research Grant application. 1. efficient methods for removing noise from audio and recognising content for tagging video sound tracks (in collaboration with Prof. Brian Moore) 2. improving hearing devices for the hearing impaired (together with Dr. Robert Carlyon, Prof. Brian Moore and Dr. David Baguley) In addition it also led to a strong collaboration with Google which has resulted in follow up funding (worth roughly £120k). The fundamental machine learning methods have led to strong engagement with Microsoft Research and Toyota.
Sector Digital/Communication/Information Technologies (including Software),Healthcare
Impact Types Societal

 
Description Amazon research award
Amount $80,000 (USD)
Organisation Amazon.com 
Sector Private
Country United States
Start 03/2019 
 
Description Baroness de Turckhiem Fund Award
Amount £22,233 (GBP)
Organisation University of Cambridge 
Department Trinity College Cambridge
Sector Academic/University
Country United Kingdom
Start 01/2017 
End 01/2019
 
Description Google European Doctoral Programme
Amount £40,000 (GBP)
Organisation Google 
Sector Private
Country United States
Start 09/2015 
End 09/2017
 
Description Google Focussed Research Award
Amount £101,455 (GBP)
Organisation Google 
Sector Private
Country United States
Start 05/2017 
End 09/2021
 
Description Machine Learning for Tomorrow: Efficient, Flexible, Robust and Automated
Amount £3,100,000 (GBP)
Funding ID EP/T005637/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 01/2020 
End 01/2025
 
Description Microsoft Research Azure Computer Credits Award
Amount $50,000 (USD)
Organisation Microsoft Research 
Sector Private
Country Global
Start 02/2018 
End 02/2019
 
Description Collaboration with Prof. Brian Moore 
Organisation University of Cambridge
Department Department of Psychology
Country United Kingdom 
Sector Academic/University 
PI Contribution Based on the research in grant EP/L000776/1, Prof. Moore approached us to apply for an EPSRC research grant which we were subsequently awarded (grant number EP/G050821/1).
Collaborator Contribution Together we devised the new research programme (based around intelligent hearing devices and intelligent hearing tests). Prof. Brian Moore brought expertise in hearing and hearing devices and industrial collaborators (Phonak). I provided expertise in audio time series modelling and machine learning.
Impact Awarded EPSRC research grant (grant number EP/G050821/1) Multi-disciplinary: machine learning, signal processing, time-series modelling, hearing, hearing aids, deafness research
Start Year 2015
 
Description Microsoft Research 
Organisation Microsoft Research
Country Global 
Sector Private 
PI Contribution Joint supervision of PhD students and Postdoctoral Research Associates; Joint projects with applications within Microsoft and beyond.
Collaborator Contribution Joint supervision of PhD students and Postdoctoral Research Associates; Joint projects with applications within Microsoft and beyond.
Impact The partnership has existed for 5 months. In that time we have had three joint papers written and accepted together. In the longer term we expect the project to have economic and societal impact in the fields of health, gaming, and intelligent software systems. We have applied for a Prosperity Partnership grant together in order to expand and strengthen the existing partnership.
Start Year 2018
 
Description Appeared on BBC Radio 5 Live 
Form Of Engagement Activity A broadcast e.g. TV/radio/film/podcast (other than news/press)
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Public/other audiences
Results and Impact Appeared on BBC 5Live programme the Naked Scientist talking about hearing and my research. The show was conducted in front of a live audience of school children.
Year(s) Of Engagement Activity 2014
URL http://www.thenakedscientists.com/HTML/interviews/interview/1001043/