SHARES - System-on-chip Heterogeneous Architecture Recognition Engine for Speech

Lead Research Organisation: Queen's University Belfast


The availability of viable, robust speech recognition systems has the potential to revolutionalise the way that people interact with mobile technology. This implies moving beyond simple call home type commands, to being able to dictate arbitrary, extensive e-mails to your mobile device and to reliably and efficiently access its increasingly complex features using natural speech. This will unlock the potential of next generation portable technology to the widest range of potential users in many important application scenarios e.g. for emergency services and military environments as well as time-efficient business and consumer usage. The current issue is, however, that the increasing algorithmic complexity needed to meet user expectations for naturalness and robustness far exceeds the processing and power capabilities forecast for current embedded processor technology. New architectures are therefore needed to radically advance the pace of state-of-the-art recognition technology for mobile and embedded devices.Commercial speech recognition engines for mobile applications are typically small footprint versions of desktop solutions, with the recognition functionality for acceptable quality highly constrained to the processing and power budget available on any given embedded platform. Applications are typically constrained to a few commands and name or song lists. In comparison, state-of-the art research systems on natural unconstrained speech run up to 200-times slower than real-time on 2.8 GHz Xeon processors. In addition, algorithmic research to maintain recognition accuracy in acoustically noisy operating environments, considered essential to widespread adoption of recognition technology, points towards even greater complexity. The gap between algorithmic requirements and the processing and power capability of conventional processor platforms is thus growing even further.For large vocabulary continuous speech recognition (LVCSR) engines, decoding the most likely sequence of words is essentially an extremely large scale search problem over all possible word combinations. To cope with the huge size of the potential search space, search networks created dynamically during decoding were, until recently, considered the only viable approach to realise large vocabulary recognition. Static networks were too big for all but more constrained vocabulary tasks. However, in a significant departure from accepted wisdom, full expansion of large vocabulary static search networks prior to decoding has been importantly demonstrated using the Weighted Finite State Transducer (WFST). The WFST structure creates considerable potential for achieving efficient regularised decoding architectures, which we intend to exploit. To our knowledge, we would be the first to specifically exploit the Weighted Finite State Transducer network decoding framework in novel hardware architectures for low power large complexity speech recognition.
Description The result of the project was the development of a software core that is unique in the fact that it provides desktop performances for large vocabulary speech recognition on embedded platforms with limited resource. This has been made possible by adopting a novel approach where most of the computation is done off-line instead of on-the-fly as it is the case in traditional approaches.
Exploitation Route The topic formed the basis of the Queen's University Impact document which has been formally published and a Impact talk given to the Public entitled "Local Talent, Global Impact" on 29th May 2013. A direct Proof of Concept (PoC132) was funded by InvestNI resulting in a patent application and a detailed commercialisation study. A number of follow-on contacts have been made with a number of companies. The project won the 2011 HiTech award at the Northern Ireland Science Park (NISP) £25k awards for the most promising ideas.
Sectors Digital/Communication/Information Technologies (including Software)

Description The research has been directly employed in a start-up company proposal called MVR (Mobile Voice Recognition). The company won the Northern Ireland Science Park (NISP) £25k HiTech competition.
First Year Of Impact 2010
Sector Digital/Communication/Information Technologies (including Software)
Impact Types Economic

Description Demonstration system for real-time large vocabulary continuous speech recognition
Amount £80,000 (GBP)
Funding ID PoC132 
Organisation Invest Northern Ireland 
Sector Public
Country United Kingdom
Start 01/2010 
End 12/2010
Description The present disclosure relates to improvements in or relating to pattern recognition, and in particular to a novel systems and methods for pattern recognition, and to devices incorporating such systems and methods. In particular, a spectral analysis is performed on a candidate pattern to obtain a set of observation vectors. The observation vectors are decoded to produce a candidate match between the observation vectors and entries from a knowledge source. The step of decoding the vectors comprises modelling the knowledge source as a weighted finite state transducer (WFST) and the step of decoding comprises modelling the propagation of tokens through nodes of the WFST for observation vectors derived from successive frames of the candidate pattern. For each successive propagation, a list of tokens is sorted in order of their associated cost, and a pruning step is applied to remove tokens from said sorted list which go beyond a pre-determined cost threshold, before performing the next iteration. 
IP Reference WO2012076895 
Protection Patent application published
Year Protection Granted 2012
Licensed No
Impact The work became part of a business proposition for a company MVR. Ltd.