SHARES - System-on-chip Heterogeneous Architecture Recognition Engine for Speech

Lead Research Organisation: Queen's University Belfast

Department Name: Sch of Electronics, Elec Eng & Comp Sci

Abstract

The availability of viable, robust speech recognition systems has the potential to revolutionalise the way that people interact with mobile technology. This implies moving beyond simple call home type commands, to being able to dictate arbitrary, extensive e-mails to your mobile device and to reliably and efficiently access its increasingly complex features using natural speech. This will unlock the potential of next generation portable technology to the widest range of potential users in many important application scenarios e.g. for emergency services and military environments as well as time-efficient business and consumer usage. The current issue is, however, that the increasing algorithmic complexity needed to meet user expectations for naturalness and robustness far exceeds the processing and power capabilities forecast for current embedded processor technology. New architectures are therefore needed to radically advance the pace of state-of-the-art recognition technology for mobile and embedded devices.Commercial speech recognition engines for mobile applications are typically small footprint versions of desktop solutions, with the recognition functionality for acceptable quality highly constrained to the processing and power budget available on any given embedded platform. Applications are typically constrained to a few commands and name or song lists. In comparison, state-of-the art research systems on natural unconstrained speech run up to 200-times slower than real-time on 2.8 GHz Xeon processors. In addition, algorithmic research to maintain recognition accuracy in acoustically noisy operating environments, considered essential to widespread adoption of recognition technology, points towards even greater complexity. The gap between algorithmic requirements and the processing and power capability of conventional processor platforms is thus growing even further.For large vocabulary continuous speech recognition (LVCSR) engines, decoding the most likely sequence of words is essentially an extremely large scale search problem over all possible word combinations. To cope with the huge size of the potential search space, search networks created dynamically during decoding were, until recently, considered the only viable approach to realise large vocabulary recognition. Static networks were too big for all but more constrained vocabulary tasks. However, in a significant departure from accepted wisdom, full expansion of large vocabulary static search networks prior to decoding has been importantly demonstrated using the Weighted Finite State Transducer (WFST). The WFST structure creates considerable potential for achieving efficient regularised decoding architectures, which we intend to exploit. To our knowledge, we would be the first to specifically exploit the Weighted Finite State Transducer network decoding framework in novel hardware architectures for low power large complexity speech recognition.

Funded Value:

£503,467

Funded Period:

Sep 06 - Mar 10

Funder:

EPSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

EP/D048605/1

Principal Investigator:

Roger Woods

Research Subject:

Info. & commun. Technol. (75%)

Linguistics (25%)

Research Topic:

Comput./Corpus Linguistics (25%)

System on Chip (75%)

Organisations

Queen's University Belfast (Lead Research Organisation)

People	ORCID iD
Roger Woods (Principal Investigator)	http://orcid.org/0000-0001-6201-4270
John McAllister (Co-Investigator)
Ming Ji (Co-Investigator)
Paul McCourt (Co-Investigator)

Publications

Author Name

Title Publication Date Published

10 25 50

Aubert L (2013) Optimization of Weighted Finite State Transducer for Speech Recognition in IEEE Transactions on Computers

Lu J (2010) Adapting noisy speech models — Extended uncertainty decoding

Lv, J (2009) Replacing Uncertainty Decoding with Subband Re-estimation for Large Vocabulary Speech Recognition in Noise in INTERSPEECH

Lv, J (2008) Noise Compensation and Missing-Feature Decoding for Large Vocabulary Speech Recognition in Noise in INTERSPEECH

Veitch R (2011) GPU acceleration of automated speech recognition for mobile devices

Veitch R (2011) FPGA Implementation of a Pipelined Gaussian Calculation for HMM-Based Large Vocabulary Speech Recognition in International Journal of Reconfigurable Computing

Veitch R (2010) Acceleration of HMM-based speech recognition system by parallel FPGA Gaussian calculation

Key Findings
Impact Summary
Further Funding
Intellectual Property


Description	The result of the project was the development of a software core that is unique in the fact that it provides desktop performances for large vocabulary speech recognition on embedded platforms with limited resource. This has been made possible by adopting a novel approach where most of the computation is done off-line instead of on-the-fly as it is the case in traditional approaches.
Exploitation Route	The topic formed the basis of the Queen's University Impact document which has been formally published and a Impact talk given to the Public entitled "Local Talent, Global Impact" on 29th May 2013. A direct Proof of Concept (PoC132) was funded by InvestNI resulting in a patent application and a detailed commercialisation study. A number of follow-on contacts have been made with a number of companies. The project won the 2011 HiTech award at the Northern Ireland Science Park (NISP) £25k awards for the most promising ideas.
Sectors	Digital/Communication/Information Technologies (including Software)
URL	http://www.ecit.qub.ac.uk/Research/WirelessCommunicationSystems/Projects/SHARES-System-on-chipHeterogeneousArchitectureRecognitionEngineforSpeech/


Description	The research has been directly employed in a start-up company proposal called MVR (Mobile Voice Recognition). The company won the Northern Ireland Science Park (NISP) £25k HiTech competition.
First Year Of Impact	2010
Sector	Digital/Communication/Information Technologies (including Software)
Impact Types	Economic


Description	Demonstration system for real-time large vocabulary continuous speech recognition
Amount	£80,000 (GBP)
Funding ID	PoC132
Organisation	Invest Northern Ireland
Sector	Public
Country	United Kingdom
Start	01/2010
End	12/2010


Title	PATTERN RECOGNITION
Description	The present disclosure relates to improvements in or relating to pattern recognition, and in particular to a novel systems and methods for pattern recognition, and to devices incorporating such systems and methods. In particular, a spectral analysis is performed on a candidate pattern to obtain a set of observation vectors. The observation vectors are decoded to produce a candidate match between the observation vectors and entries from a knowledge source. The step of decoding the vectors comprises modelling the knowledge source as a weighted finite state transducer (WFST) and the step of decoding comprises modelling the propagation of tokens through nodes of the WFST for observation vectors derived from successive frames of the candidate pattern. For each successive propagation, a list of tokens is sorted in order of their associated cost, and a pruning step is applied to remove tokens from said sorted list which go beyond a pre-determined cost threshold, before performing the next iteration.
IP Reference	WO2012076895
Protection	Patent application published
Year Protection Granted	2012
Licensed	No
Impact	The work became part of a business proposition for a company MVR. Ltd.

Abstract

Organisations

People

ORCID iD

Publications