Feature-Combination for Noise Robust Speech Pattern Processing

Lead Research Organisation: University of Birmingham
Department Name: Electronic, Electrical and Computer Eng

Abstract

Current systems for automatic speech recognition by computer can obtain an acceptable performance in carefully controlled environments. However, in real-world situations, speech signal is usually contaminated by an acoustic background environmental noise. While humans show strong robustness to noise, the performance of current automatic speech recognition systems degrades rapidly, even for a simple task such as digit recognition.Speech signal may be represented by multiple features, which may be obtained by using different sources of information or different processing techniques on a specific source. In a given set of features, there may be some features corrupted by noise. Ideally, the features dominated by noise should be excluded from recognition. To achieve this, a-priori knowledge about the identity of the noisy features is required. Unfortunately locating the corrupted features itself can be a difficult task, if there is no prior information about the noise. Thus, to exploit the potential of the unaffected features, we face the problem of how to combine the features when assuming no knowledge about the noise.In our previous work, we developed a feature-combination model that attempts to release the need for identification of the noisy features. A key result of previous studies is that, when the noise has a partial frequency/temporal character, this model using no information about noisy features has achieved similar recognition performance as a model using full a-priori knowledge about the noisy features.Our previous study dealt with a general problem of combination of features in order to eliminate the effect of noisy features under the assumption of no knowledge about the noise. This provides a good base for the development of more powerful feature-combination models capable of exploiting the inherent properties of speech signals. Our proposed research aims to develop feature-combination models that incorporate: (1) the fact that in a wide-band noisy environment, the valleys of spectrum are easily corrupted by noise while peaks are often affected little; (2) any information about reliability of features, which may often be available by exploiting properties of speech signals. Moreover, the proposed investigation on modelling of speech signals based on modelling the filter and source information separately can be incorporated into the feature-combination models. Such models will be tailored for speech pattern processing and thus should provide an improved recognition performance. Our final goal is to demonstrate competitive performance in speech and speaker recognition; we aim to achieve significant performance improvements on standard datasets (TIDIGITS, TIMIT, Resource Management, and Switchboard, respectively).

Publications

10 25 50