Perceptual constancy in real-room listening by humans and machines

Lead Research Organisation: University of Sheffield
Department Name: Computer Science

Abstract

Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.

Publications

10 25 50
 
Description This project investigated perceptual compensation for the effects of room reverberation on speech recognition, using experiments with both human listeners and machine systems (auditory models and automatic speech recognition (ASR) systems). Our key findings were:



Perceptual data on compensation for the effects of reverberation in a 'sir-stir' identification task was closely matched by a computer model of auditory efferent processing. In the computer model, the peripheral auditory response was suppressed by efferent activity, which in turn was determined by the dynamic range of the preceding auditory nerve response. The model therefore suggests that perceptual compensation for the effects of reverberation is underlain, at least in part, by auditory mechanisms responsible for the monitoring and control of dynamic range.



The computer model was able to explain Watkins' finding that perceptual compensation in the 'sir-stir' task was not affected by time-reversing the speech preceding the test word, but that compensation was abolished when the room impulse response was reversed. Time reversal of the speech preceding the test word (the 'context') did not substantially affect the dynamic range of the auditory nerve response, so our model made the same predictions in the forward-speech and reverse-speech conditions. For these experimental stimuli, however, reversing the room impulse response increased the dynamic range of the context prior to the test word. This caused a reduction in the efferent response and removed the compensation effect.



We have demonstrated that perceptual compensation for reverberation was apparent for human listeners in a consonant identification task using naturalistic speech. Test words included a wider range of consonants (/t/, /k/, /p/), six vowels, and were spoken by both male and female voices in realistically variable speech utterances. Compensation was apparent in the pattern of confusions made by listeners, as quantified in terms of the relative information transmitted. When more reverberation was added to the test word only, listeners confused many consonants. These confusions were largely resolved when similar reverberation was added to the speech preceding the test word.



Using this (/t/, /k/, /p/) consonant identification task, we confirmed Watkins' finding that perceptual compensation is abolished when the room impulse response is time-reversed. We found also that the reverberation tail of the test word contributes to perceptual compensation under certain conditions. Compensation was reduced when the reverberation tail of the test word was partially removed by gating, but this effect was only apparent when the context preceding the test word did not itself promote compensation. Further, we investigated the time course of perceptual compensation using reverberant contexts of gradually increasing duration and found that compensation increased with increasing context duration up to 500 ms, the maximum time course considered.



A reverberation-robust ASR system was developed in which different statistical models of speech were selected according to the dynamic range of the speech signal. The model showed a close match to the confusion patterns made by human listeners in our perceptual data.



Finally, principles derived from our perceptual experiments (within-band processing and the role of reverberation tails) were implemented in a 'missing feature' ASR system. In this approach, a number of acoustic cues (including those derived from a model of binaural processing) were employed to identify time-frequency regions of speech that were relatively uncorrupted by reverberation. The 'clean' regions were used directly for recognition, whereas the true values of the corrupted regions were imputed (reconstructed) from statistical models of speech. The ASR system gave a good performance on the CHiME challenge, an evaluation corpus in which the speech is corrupted by reverberation and environmental noise.
Exploitation Route Hearing impaired listeners have particular problems understanding speech in reverberant environments. Our computer models and psychophysical data could be used to improve the performance of hearing aids and cochlear implants in reverberant conditions. The results of this research will be of interest to researchers developing reverberation-robust automatic speech recognition systems, which wish to include human-like processing in their algorithms. Our computer model makes predictions that can be tested by workers in auditory neurophysiology.
Sectors Healthcare

URL http://staffwww.dcs.shef.ac.uk/people/G.Brown/constancy/index.htm
 
Description Modelling the role of the auditory efferent system in the recognition of noisy and reverberberant speech 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact A talk to the universities of Oldenburg and Magdeburg, Germany, reviewing our work on noise-robust and reverberation-robust front-end processors for automatic speech recognition, based on models of auditory efferent function.
Year(s) Of Engagement Activity 2011