Probabilistic Auditory Scene Analysis

Lead Research Organisation: UNIVERSITY OF CAMBRIDGE

Department Name: Engineering

Abstract

Auditory environments are typically very complicated. For example, thecocktail party comprises many sources; the chinking of glasses; thechattering of the many guests; the sound of backgroundmusic. Nevertheless, our auditory system can make sense of such ascene; it can work out how many acoustic sources there are anddetermine the individual contributions to the scene fromeach. Remarkably, it can do this using the information from a singlemicrophone. A major goal of auditory neuroscience is to understandhow the auditory system achieves this feat.Broadly speaking, it is thought that there are three stages toauditory scene analysis. The first stage is well understoodphysiologically and that is to convert the incoming sound into atime-frequency representation. This reveals the local energy in afrequency band at a particular time. In the second stage,psychophysical evidence suggests that primitive grouping principlesare used to group local regions of spectral-temporal energy arisingfrom a common source. By using simple stimuli - like tones and noise -a long list of primitive grouping principles have been elucidated. Forexample, the principle of good continuation identifies smoothlyvarying features with a single source and abrupt changes as asignature of separate sources. In the final stage of auditory sceneanalysis, called schema-based grouping, higher level knowledge, likethe structure of music or speech, is used to bind the groups ofspectral-temporal energy into streams so that there is one stream foreach source.There are many outstanding questions with this framework. Oneimportant open question is the role that auditory cortex plays inauditory scene analysis as it is not well established. Anotherconcerns the generality and completeness of the established list ofprimitive grouping rules. For although the principles successfullycharacterise perception of simple sounds it is unclear how successfuland relevant the description is for natural sounds. This project aims to resolve these questions though modelling work,psychophysics experiments and neural recording experiments. The newidea is to view the primitive grouping principles as arising frominference in a latent variable model of auditory scenes. A latentvariable model is a description of how an auditory scene, like thatencountered at a coctail party, is composed of latent auditorysources, like the chinking glasses and chattering guests. It alsoincludes a description of the statistics of these sources, like thefact that the chinking glasses tend to be isolated, high frequencyevents whist the chattering rather more constant and lower infrequency. The idea is that the brain is trying to infer these latentsources using prior knowledge of their statistics. New tools ofprobabilistic inference can make these intuitions concrete.This new perspective, called probabilistic scene analysis, has twomain advantages; one practical and one theoretical. The practicaladvantage is that a statistical characterisation of sounds can be usedto produce stimuli with complicated, but controlled structure, for usein experiments. The theoretical benefit is that the list of primitivegrouping rules, and the manner in which they trade off, are nowderived from the statistics of sounds; Heuristic implementation is nolonger required. This enables us to predict the results of theexperiments. In particular, the psychophysics experiments are aimedat resolving both how auditory grouping operates in synthetic auditorytextures (e.g. rain, wind, water etc.) and whether this is consistentwith the probabilistic account. Furthermore, the neural recordingexperiments will investigate the role of auditory cortex in auditoryscene analysis, and the hypothesis that it is representing high levelstatistics of sounds like slowly varying modulatory components.

Funded Value:

£232,105

Funded Period:

Jan 10 - Jan 13

Funder:

EPSRC

Project Status:

Closed

Project Category:

Fellowship

Project Reference:

EP/G050821/1

Principal Investigator:

Richard Turner

Research Subject:

Info. & commun. Technol. (50%)

Medical & health interface (50%)

Research Topic:

Biomedical neuroscience (50%)

Vision & Senses - ICT appl. (50%)

Organisations

People	ORCID iD
Richard Turner (Principal Investigator)

Publications

Author Name Title Publication

Date Published

|< < 1 2 > >|

10 25 50

Bui T (2016) Deep Gaussian Processes for Regression using Approximate Expectation Propagation

Matthews A (2015) On Sparse variational methods and the Kullback-Leibler divergence between stochastic processes

Bui T (2015) Training Deep Gaussian Processes using Stochastic Expectation Propagation and Probabilistic Backpropagation

Hernández-Lobato D (2015) Stochastic Expectation Propagation for Large Scale Gaussian Process Classification

Li Y (2015) Stochastic Expectation Propagation

Alexander A (2016) On sparse variational methods and the Kullback-Leibler divergence between stochastic processes

Gu S (2015) Neural adaptive sequential Monte Carlo

Hernández-Lobato D (2015) Stochastic Expectation Propagation for Large Scale Gaussian Process Classification

Gu S (2015) Neural Adaptive Sequential Monte Carlo

Turner R (2012) Decomposing signals into a sum of amplitude and frequency modulated sinusoids using probabilistic inference

Key Findings
Impact Summary
Collaboration


Description	We have provided new approaches to a number of classic and widely used signal processing techniques using modern day machine learning methods. The new approaches have two key advantages. First, they adapt to the input signal meaning that they can devote their limited representational resources to where they are most useful. Second, they handle uncertainty which arises from noise and other sources, in an automatic way. This means that the new representations are of higher quality than the old approaches and they are more robust to noise, both of which makes them suitable for real world application. We have demonstrated their usefulness on a range of engineering, scientific, and medical applications. Including the removal of noise from speech and the analysis of brain recordings.
Exploitation Route	The engineering applications of these techniques include, speech recognition, speaker identification, retrieval of audio for search systems and pitch detection and manipulation. The medical application of these methods include, analysis of brain recordings from patients, processing for cochlear implants and hearing aids, and we are currently developing a method to relieve tinnitus.
Sectors	Digital/Communication/Information Technologies (including Software) Healthcare
URL	http://cbl.eng.cam.ac.uk/Public/Turner/WebHome


Description	The results from this grant have been used in the following ways: -- formed the basis for an EPSRC First Grant -- formed the basis for an iindustrial partnership with Google on audio recognition and restoration -- being applied to improving cochlear implants
First Year Of Impact	2011
Sector	Digital/Communication/Information Technologies (including Software),Healthcare
Impact Types	Societal Economic


Description	Collaboration with Dr. Carlyon at the MRC CBSU
Organisation	Medical Research Council (MRC)
Department	MRC Cognition and Brain Sciences Unit
Country	United Kingdom
Sector	Academic/University
PI Contribution	-- design of stimuli for improved fitting of cochlear implants -- design of stimuli for tinnitus audio therapy
Collaborator Contribution	-- provided new learning algorithms that produced stimuli for the experiments -- experiments were carried out by Dr. Carlyon's research group
Impact	-- reveived £17,000 grant from Trinity College to continue research (used to fund a Research Assistant who gathered pilot data for the project) -- received £100,000 donation from Advanced Bionics to develop a test for fitting cochlear implants (will be used to fund a postdoc starting in May)
Start Year	2012

Abstract

Organisations

People

ORCID iD

Publications