Probabilistic Auditory Scene Analysis
Lead Research Organisation:
University of Cambridge
Department Name: Engineering
Abstract
Auditory environments are typically very complicated. For example, thecocktail party comprises many sources; the chinking of glasses; thechattering of the many guests; the sound of backgroundmusic. Nevertheless, our auditory system can make sense of such ascene; it can work out how many acoustic sources there are anddetermine the individual contributions to the scene fromeach. Remarkably, it can do this using the information from a singlemicrophone. A major goal of auditory neuroscience is to understandhow the auditory system achieves this feat.Broadly speaking, it is thought that there are three stages toauditory scene analysis. The first stage is well understoodphysiologically and that is to convert the incoming sound into atime-frequency representation. This reveals the local energy in afrequency band at a particular time. In the second stage,psychophysical evidence suggests that primitive grouping principlesare used to group local regions of spectral-temporal energy arisingfrom a common source. By using simple stimuli - like tones and noise -a long list of primitive grouping principles have been elucidated. Forexample, the principle of good continuation identifies smoothlyvarying features with a single source and abrupt changes as asignature of separate sources. In the final stage of auditory sceneanalysis, called schema-based grouping, higher level knowledge, likethe structure of music or speech, is used to bind the groups ofspectral-temporal energy into streams so that there is one stream foreach source.There are many outstanding questions with this framework. Oneimportant open question is the role that auditory cortex plays inauditory scene analysis as it is not well established. Anotherconcerns the generality and completeness of the established list ofprimitive grouping rules. For although the principles successfullycharacterise perception of simple sounds it is unclear how successfuland relevant the description is for natural sounds. This project aims to resolve these questions though modelling work,psychophysics experiments and neural recording experiments. The newidea is to view the primitive grouping principles as arising frominference in a latent variable model of auditory scenes. A latentvariable model is a description of how an auditory scene, like thatencountered at a coctail party, is composed of latent auditorysources, like the chinking glasses and chattering guests. It alsoincludes a description of the statistics of these sources, like thefact that the chinking glasses tend to be isolated, high frequencyevents whist the chattering rather more constant and lower infrequency. The idea is that the brain is trying to infer these latentsources using prior knowledge of their statistics. New tools ofprobabilistic inference can make these intuitions concrete.This new perspective, called probabilistic scene analysis, has twomain advantages; one practical and one theoretical. The practicaladvantage is that a statistical characterisation of sounds can be usedto produce stimuli with complicated, but controlled structure, for usein experiments. The theoretical benefit is that the list of primitivegrouping rules, and the manner in which they trade off, are nowderived from the statistics of sounds; Heuristic implementation is nolonger required. This enables us to predict the results of theexperiments. In particular, the psychophysics experiments are aimedat resolving both how auditory grouping operates in synthetic auditorytextures (e.g. rain, wind, water etc.) and whether this is consistentwith the probabilistic account. Furthermore, the neural recordingexperiments will investigate the role of auditory cortex in auditoryscene analysis, and the hypothesis that it is representing high levelstatistics of sounds like slowly varying modulatory components.
People |
ORCID iD |
Richard Turner (Principal Investigator) |
Publications
Hernández-Lobato D
(2015)
Stochastic Expectation Propagation for Large Scale Gaussian Process Classification
Li Y
(2015)
Stochastic Expectation Propagation
Hernández-Lobato D
(2015)
Stochastic Expectation Propagation for Large Scale Gaussian Process Classification
Richard Turner (Author)
(2011)
Probabilistic amplitude and frequency demodulation
Gu S
(2015)
Neural adaptive sequential Monte Carlo
Gu S
(2015)
Neural Adaptive Sequential Monte Carlo
Description | We have provided new approaches to a number of classic and widely used signal processing techniques using modern day machine learning methods. The new approaches have two key advantages. First, they adapt to the input signal meaning that they can devote their limited representational resources to where they are most useful. Second, they handle uncertainty which arises from noise and other sources, in an automatic way. This means that the new representations are of higher quality than the old approaches and they are more robust to noise, both of which makes them suitable for real world application. We have demonstrated their usefulness on a range of engineering, scientific, and medical applications. Including the removal of noise from speech and the analysis of brain recordings. |
Exploitation Route | The engineering applications of these techniques include, speech recognition, speaker identification, retrieval of audio for search systems and pitch detection and manipulation. The medical application of these methods include, analysis of brain recordings from patients, processing for cochlear implants and hearing aids, and we are currently developing a method to relieve tinnitus. |
Sectors | Digital/Communication/Information Technologies (including Software) Healthcare |
URL | http://cbl.eng.cam.ac.uk/Public/Turner/WebHome |
Description | The results from this grant have been used in the following ways: -- formed the basis for an EPSRC First Grant -- formed the basis for an iindustrial partnership with Google on audio recognition and restoration -- being applied to improving cochlear implants |
First Year Of Impact | 2011 |
Sector | Digital/Communication/Information Technologies (including Software),Healthcare |
Impact Types | Societal Economic |
Description | Collaboration with Dr. Carlyon at the MRC CBSU |
Organisation | Medical Research Council (MRC) |
Department | MRC Cognition and Brain Sciences Unit |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | -- design of stimuli for improved fitting of cochlear implants -- design of stimuli for tinnitus audio therapy |
Collaborator Contribution | -- provided new learning algorithms that produced stimuli for the experiments -- experiments were carried out by Dr. Carlyon's research group |
Impact | -- reveived £17,000 grant from Trinity College to continue research (used to fund a Research Assistant who gathered pilot data for the project) -- received £100,000 donation from Advanced Bionics to develop a test for fitting cochlear implants (will be used to fund a postdoc starting in May) |
Start Year | 2012 |