A Model Based Approach Towards Practical Blind Enhancement of Audio Signals Acquired in Real Acoustic Environments

Lead Research Organisation: University of Edinburgh
Department Name: Sch of Engineering


This proposal concerns enhancing the quality and intelligibility of audio.The ubiquitousness of digital audio in broadcasting, storage, and multimedia applications, each offering crystal clear sound quality, has resulted in a heightened awareness and expectation of the achievable performance of applications involving audio signals: digital hearing aids should outperform their analogue counterparts in concert halls, speech recognition software should achieve high recognition rates in office environments, and hands-free telephones must produce intelligible speech when used in car cabins.The quality and intelligibility of speech obtained in these scenarios is constrained not just by the reproduction quality of the hardware itself; rather, it is dependent on the acoustical properties of the environment in which the audio is acquired.Specifically, audio signals in confined acoustic environments exhibit reverberation; this causes problems in two major classes of signal processing applications. The first is in automatic speech recognition in which it is more difficult to identify reverberant speech than closely coupled speech. This prevents hands-free interaction without the undesirable constraint that a user must carry a microphone close to their mouth. The second class involves the desire to improve speech quality and intelligibility from devices such as mobile and hands-free telephones and next generation digital hearing aids. In each scenario, the presence of reverberation should be reduced to adequate levels by a robust speech enhancement algorithm that can be applied in any acoustic environment, and which does not rely on the acoustic properties being known a priori. Since neither the acoustic impulse response (AIR) nor the source audio is known in this situation, the process of removing the effects of reverberation is known as blind dereverberation.Previously, blind dereverberation has often been approached assuming the AIR between the source and sensor is time-invariant. This might be appropriate in scenarios where the source-sensor geometry is not rapidly varying, for example, a hands-free kit in a car cabin, in which the driver and the microphone are approximately fixed relative to one another, or in a work environment where a user is seated in front of a computer terminal in roughly the same configuration. However, there are many applications where the source-sensor geometry is subject to change; the wearer of a hearing-aid will typically wish to move around a room, as might users of hands-free conference telephony equipment.Moreover, it is not beyond possibility that the acoustics of the room itself vary; the changing state of doors, windows, or items being moved in the room will influence the room dynamics, as will a moving person. Consequently, in order to develop a blind dereverberation algorithm that is suitable for practical applications, it is important to account for source-sensor movement, and for possible changes in the acoustical properties of the room.This proposal uses model based signal processing, robust Bayesian statistical parameter estimation and numerical optimisation methods, in order to obtain practical algorithms to tackle this problem. Model-based signal processing is fundamentally based on the availability of realistic, tractable, and extensiblemodels that reflect the underlying processes and systems involved. This proposal focuses on developing, implementing, testing, and applying a number of models that have not previously been investigated in blind speech dereverberation. These include- a complete speech model that accounts for both voiced and unvoiced speech;- a more realistic room acoustic model;- subband methods for dealing with the complicated acoustic responses that occur in realistic acoustic environments;- models that can account for varying source-sensor geometries;- models that can be estimated using batch and sequential Monte Carlo methods.