Probabilistic Auditory Scene Analysis

Lead Research Organisation: University of Cambridge
Department Name: Engineering

Abstract

Auditory environments are typically very complicated. For example, thecocktail party comprises many sources; the chinking of glasses; thechattering of the many guests; the sound of backgroundmusic. Nevertheless, our auditory system can make sense of such ascene; it can work out how many acoustic sources there are anddetermine the individual contributions to the scene fromeach. Remarkably, it can do this using the information from a singlemicrophone. A major goal of auditory neuroscience is to understandhow the auditory system achieves this feat.Broadly speaking, it is thought that there are three stages toauditory scene analysis. The first stage is well understoodphysiologically and that is to convert the incoming sound into atime-frequency representation. This reveals the local energy in afrequency band at a particular time. In the second stage,psychophysical evidence suggests that primitive grouping principlesare used to group local regions of spectral-temporal energy arisingfrom a common source. By using simple stimuli - like tones and noise -a long list of primitive grouping principles have been elucidated. Forexample, the principle of good continuation identifies smoothlyvarying features with a single source and abrupt changes as asignature of separate sources. In the final stage of auditory sceneanalysis, called schema-based grouping, higher level knowledge, likethe structure of music or speech, is used to bind the groups ofspectral-temporal energy into streams so that there is one stream foreach source.There are many outstanding questions with this framework. Oneimportant open question is the role that auditory cortex plays inauditory scene analysis as it is not well established. Anotherconcerns the generality and completeness of the established list ofprimitive grouping rules. For although the principles successfullycharacterise perception of simple sounds it is unclear how successfuland relevant the description is for natural sounds. This project aims to resolve these questions though modelling work,psychophysics experiments and neural recording experiments. The newidea is to view the primitive grouping principles as arising frominference in a latent variable model of auditory scenes. A latentvariable model is a description of how an auditory scene, like thatencountered at a coctail party, is composed of latent auditorysources, like the chinking glasses and chattering guests. It alsoincludes a description of the statistics of these sources, like thefact that the chinking glasses tend to be isolated, high frequencyevents whist the chattering rather more constant and lower infrequency. The idea is that the brain is trying to infer these latentsources using prior knowledge of their statistics. New tools ofprobabilistic inference can make these intuitions concrete.This new perspective, called probabilistic scene analysis, has twomain advantages; one practical and one theoretical. The practicaladvantage is that a statistical characterisation of sounds can be usedto produce stimuli with complicated, but controlled structure, for usein experiments. The theoretical benefit is that the list of primitivegrouping rules, and the manner in which they trade off, are nowderived from the statistics of sounds; Heuristic implementation is nolonger required. This enables us to predict the results of theexperiments. In particular, the psychophysics experiments are aimedat resolving both how auditory grouping operates in synthetic auditorytextures (e.g. rain, wind, water etc.) and whether this is consistentwith the probabilistic account. Furthermore, the neural recordingexperiments will investigate the role of auditory cortex in auditoryscene analysis, and the hypothesis that it is representing high levelstatistics of sounds like slowly varying modulatory components.

Publications

10 25 50
publication icon
Archer-Boyd AW (2018) Development and validation of a spectro-temporal processing test for cochlear-implant listeners. in The Journal of the Acoustical Society of America

publication icon
Bui T.D. (2016) Deep Gaussian processes for regression using approximate expectation propagation in 33rd International Conference on Machine Learning, ICML 2016

publication icon
Schlittenlacher J (2018) Audiogram estimation using Bayesian active learning. in The Journal of the Acoustical Society of America

publication icon
Turner R (2014) Time-Frequency Analysis as Probabilistic Inference in IEEE Transactions on Signal Processing

publication icon
Turner R (2011) Demodulation as Probabilistic Inference in IEEE Transactions on Audio, Speech, and Language Processing

 
Description We have provided new approaches to a number of classic and widely used signal processing techniques using modern day machine learning methods. The new approaches have two key advantages. First, they adapt to the input signal meaning that they can devote their limited representational resources to where they are most useful. Second, they handle uncertainty which arises from noise and other sources, in an automatic way. This means that the new representations are of higher quality than the old approaches and they are more robust to noise, both of which makes them suitable for real world application. We have demonstrated their usefulness on a range of engineering, scientific, and medical applications. Including the removal of noise from speech and the analysis of brain recordings.
Exploitation Route The engineering applications of these techniques include, speech recognition, speaker identification, retrieval of audio for search systems and pitch detection and manipulation.



The medical application of these methods include, analysis of brain recordings from patients, processing for cochlear implants and hearing aids, and we are currently developing a method to relieve tinnitus.
Sectors Digital/Communication/Information Technologies (including Software),Healthcare

URL http://cbl.eng.cam.ac.uk/Public/Turner/WebHome
 
Description The results from this grant have been used in the following ways: -- formed the basis for an EPSRC First Grant -- formed the basis for an iindustrial partnership with Google on audio recognition and restoration -- being applied to improving cochlear implants
First Year Of Impact 2011
Sector Digital/Communication/Information Technologies (including Software),Healthcare
Impact Types Societal,Economic

 
Description Collaboration with Dr. Carlyon at the MRC CBSU 
Organisation Medical Research Council (MRC)
Department MRC Cognition and Brain Sciences Unit
Country United Kingdom 
Sector Academic/University 
PI Contribution -- design of stimuli for improved fitting of cochlear implants -- design of stimuli for tinnitus audio therapy
Collaborator Contribution -- provided new learning algorithms that produced stimuli for the experiments -- experiments were carried out by Dr. Carlyon's research group
Impact -- reveived £17,000 grant from Trinity College to continue research (used to fund a Research Assistant who gathered pilot data for the project) -- received £100,000 donation from Advanced Bionics to develop a test for fitting cochlear implants (will be used to fund a postdoc starting in May)
Start Year 2012