Feature-Combination for Noise Robust Speech Pattern Processing
Lead Research Organisation:
University of Birmingham
Department Name: Electronic, Electrical and Computer Eng
Abstract
Current systems for automatic speech recognition by computer can obtain an acceptable performance in carefully controlled environments. However, in real-world situations, speech signal is usually contaminated by an acoustic background environmental noise. While humans show strong robustness to noise, the performance of current automatic speech recognition systems degrades rapidly, even for a simple task such as digit recognition.Speech signal may be represented by multiple features, which may be obtained by using different sources of information or different processing techniques on a specific source. In a given set of features, there may be some features corrupted by noise. Ideally, the features dominated by noise should be excluded from recognition. To achieve this, a-priori knowledge about the identity of the noisy features is required. Unfortunately locating the corrupted features itself can be a difficult task, if there is no prior information about the noise. Thus, to exploit the potential of the unaffected features, we face the problem of how to combine the features when assuming no knowledge about the noise.In our previous work, we developed a feature-combination model that attempts to release the need for identification of the noisy features. A key result of previous studies is that, when the noise has a partial frequency/temporal character, this model using no information about noisy features has achieved similar recognition performance as a model using full a-priori knowledge about the noisy features.Our previous study dealt with a general problem of combination of features in order to eliminate the effect of noisy features under the assumption of no knowledge about the noise. This provides a good base for the development of more powerful feature-combination models capable of exploiting the inherent properties of speech signals. Our proposed research aims to develop feature-combination models that incorporate: (1) the fact that in a wide-band noisy environment, the valleys of spectrum are easily corrupted by noise while peaks are often affected little; (2) any information about reliability of features, which may often be available by exploiting properties of speech signals. Moreover, the proposed investigation on modelling of speech signals based on modelling the filter and source information separately can be incorporated into the feature-combination models. Such models will be tailored for speech pattern processing and thus should provide an improved recognition performance. Our final goal is to demonstrate competitive performance in speech and speaker recognition; we aim to achieve significant performance improvements on standard datasets (TIDIGITS, TIMIT, Resource Management, and Switchboard, respectively).
Organisations
People |
ORCID iD |
Peter Jancovic (Principal Investigator) |
Publications
Janc?ovic? P
(2009)
Improving automatic phoneme alignment under noisy conditions by incorporating spectral voicing information
in Electronics Letters
Jancovic P
(2009)
Incorporating the voicing information into HMM-based automatic speech recognition in noisy environments
in Speech Communication
Jancovic P
(2007)
Estimation of Voicing-Character of Speech Spectra Based on Spectral Shape
in IEEE Signal Processing Letters
Jancovic P
(2007)
Fast Algorithm for Calculation of the Union-Based Probability
in IEEE Transactions on Audio, Speech and Language Processing
Zou X
(2008)
Speech Signal Enhancement Based on MAP Algorithm in the ICA Space
in IEEE Transactions on Signal Processing