Feature-Combination for Noise Robust Speech Pattern Processing

Lead Research Organisation: University of Birmingham

Department Name: Electronic, Electrical and Computer Eng

Abstract

Current systems for automatic speech recognition by computer can obtain an acceptable performance in carefully controlled environments. However, in real-world situations, speech signal is usually contaminated by an acoustic background environmental noise. While humans show strong robustness to noise, the performance of current automatic speech recognition systems degrades rapidly, even for a simple task such as digit recognition.Speech signal may be represented by multiple features, which may be obtained by using different sources of information or different processing techniques on a specific source. In a given set of features, there may be some features corrupted by noise. Ideally, the features dominated by noise should be excluded from recognition. To achieve this, a-priori knowledge about the identity of the noisy features is required. Unfortunately locating the corrupted features itself can be a difficult task, if there is no prior information about the noise. Thus, to exploit the potential of the unaffected features, we face the problem of how to combine the features when assuming no knowledge about the noise.In our previous work, we developed a feature-combination model that attempts to release the need for identification of the noisy features. A key result of previous studies is that, when the noise has a partial frequency/temporal character, this model using no information about noisy features has achieved similar recognition performance as a model using full a-priori knowledge about the noisy features.Our previous study dealt with a general problem of combination of features in order to eliminate the effect of noisy features under the assumption of no knowledge about the noise. This provides a good base for the development of more powerful feature-combination models capable of exploiting the inherent properties of speech signals. Our proposed research aims to develop feature-combination models that incorporate: (1) the fact that in a wide-band noisy environment, the valleys of spectrum are easily corrupted by noise while peaks are often affected little; (2) any information about reliability of features, which may often be available by exploiting properties of speech signals. Moreover, the proposed investigation on modelling of speech signals based on modelling the filter and source information separately can be incorporated into the feature-combination models. Such models will be tailored for speech pattern processing and thus should provide an improved recognition performance. Our final goal is to demonstrate competitive performance in speech and speaker recognition; we aim to achieve significant performance improvements on standard datasets (TIDIGITS, TIMIT, Resource Management, and Switchboard, respectively).

Funded Value:

£116,442

Funded Period:

May 06 - Jul 08

Funder:

EPSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

EP/D033659/1

Principal Investigator:

Peter Jancovic

Research Subject:

Info. & commun. Technol. (100%)

Research Topic:

Human Communication in ICT (100%)

Organisations

University of Birmingham (Lead Research Organisation)

People	ORCID iD
Peter Jancovic (Principal Investigator)

Publications

Author Name

Title Publication Date Published

10 25 50

Janc?ovic? P (2009) Improving automatic phoneme alignment under noisy conditions by incorporating spectral voicing information in Electronics Letters

Jancovic P (2009) Incorporating the voicing information into HMM-based automatic speech recognition in noisy environments in Speech Communication

Jancovic P (2007) Estimation of Voicing-Character of Speech Spectra Based on Spectral Shape in IEEE Signal Processing Letters

Jancovic P (2007) Fast Algorithm for Calculation of the Union-Based Probability in IEEE Transactions on Audio, Speech and Language Processing

Zou X (2007) ICA-Based MAP Algorithm for Speech Signal Enhancement

Zou X (2008) Speech Signal Enhancement Based on MAP Algorithm in the ICA Space in IEEE Transactions on Signal Processing

Abstract

Organisations

People

ORCID iD

Publications