SpeechWave
Lead Research Organisation:
University of Edinburgh
Department Name: Sch of Informatics
Abstract
Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.
Publications
Bell P
(2021)
Adaptation Algorithms for Neural Network-Based Speech Recognition: An Overview
in IEEE Open Journal of Signal Processing
Fainberg J
(2019)
Acoustic Model Adaptation from Raw Waveforms with Sincnet
Loweimi E
(2021)
Speech Acoustic Modelling from Raw Phase Spectrum
Loweimi E
(2023)
Multi-Stream Acoustic Modelling Using Raw Real and Imaginary Parts of the Fourier Transform
in IEEE/ACM Transactions on Audio, Speech, and Language Processing
Loweimi E
(2020)
Raw Sign and Magnitude Spectra for Multi-Head Acoustic Modelling
Loweimi E
(2021)
Speech Acoustic Modelling Using Raw Source and Filter Components
Loweimi E
(2023)
Phonetic Error Analysis Beyond Phone Error Rate
in IEEE/ACM Transactions on Audio, Speech, and Language Processing
Description | We are currently halfway through this 3 year project. The main objectives of the project are to explore approaches to speech recognition using the raw waveform, and to develop a deeper theoretical understanding of such approaches. The key findings so far include: 1/ Development of state-of-the-art baseline systems for waveform based speech recognition using the SincNet architecture which enable signal processing algorithms to be learned from from data 2/ Development of a windowed attention model for end to end speech recognition 3/ Theoretical analysis on the statistical normalisation of bottleneck features for speech recognition 4/ Development of an automatic adaptation approach for waveform-based speech recognition, which demonstrated the ability to adapt a system trained on adult speech to successfully recognise children's speech, using a limited amount of child data. 5/ Detailed theoretical and experimental investigation of different learnable filters in waveform-based speech recognition. 6/ Development of a dynamic subsampling approach for end-to-end speech recognition, enabling the model to skip redundant data. |
Exploitation Route | Our systems are being released as open source software and may be applied to speech recognition problems, |
Sectors | Aerospace, Defence and Marine,Digital/Communication/Information Technologies (including Software),Healthcare,Culture, Heritage, Museums and Collections |
Description | Within SpeechWave we have taken measures to maximise the impact of our research. These will be in two main areas: 1. Broadcast and Media (via project partners BBC and Quorate). The focus of this work is to develop robust media transcription prototypes, able to cope with the diverse range of broadcast media. Media transcription has direct benefits (for example supporting accessibility through automatic subtitling), as well as enabling intelligent processing of broadcast media through natural language processing and text analytics. 2. Distant Speech Recognition (via project partner Emotech). The focus of this work is to develop prototype software for speech recognition in personal robots. Speech is perhaps the most natural communication modality for such robots, but the acoustic conditions can be extremely challenging due to reverberation and competing acoustic sources. Improving speech recognition accuracy for such devices in challenging environments is likely to have a significant impact on their usability and uptake. We also plan to enhance the global impact of our research through project partner SRI International who have a specific R&D interest in speech recognition in highly challenging acoustic environments. We have also begun new collaborations with Toshiba and with Samsung in the area of end-to-end speech recognition. |
First Year Of Impact | 2019 |
Sector | Creative Economy,Digital/Communication/Information Technologies (including Software),Culture, Heritage, Museums and Collections |
Impact Types | Cultural,Economic |
Description | Adapting end-to-end speech recognition systems (year 1) |
Amount | £137,365 (GBP) |
Organisation | Samsung |
Sector | Private |
Country | Korea, Republic of |
Start | 12/2018 |
End | 11/2019 |
Description | Adapting end-to-end speech recognition systems (year 2) |
Amount | £113,989 (GBP) |
Organisation | Samsung |
Sector | Private |
Country | Korea, Republic of |
Start | 12/2019 |
End | 11/2020 |
Description | BBC Data Science Partnership |
Organisation | British Broadcasting Corporation (BBC) |
Department | BBC Research & Development |
Country | United Kingdom |
Sector | Public |
PI Contribution | Development of speech and language technology applied to broadcasting and media production |
Collaborator Contribution | R&D work from BBC researchers; data sharing. |
Impact | MGB Challenge iCASE studentships EPSRC SCRIPT Project |
Start Year | 2017 |
Description | Emotech |
Organisation | EmoTech Ltd |
Country | United Kingdom |
Sector | Private |
PI Contribution | We are developing models and algorithms for raw-waveform based speech recognition with the aim of significantly improving robustness to acoustic conditions. |
Collaborator Contribution | We shall work with Emotech on evaluating our models and algorithms using data collected by Emotech and made available to the project researchers. Furthermore we plan to conduct experiments using Emotech's Olly platform, and to this end Emotech will donate two devices to the project along with the required software development platform. Through the collaboration with Emotech we shall be able to evaluate the novel contributions pro- vided by SpeechWave, against the current state-of- the-art in realistic circumstances. |
Impact | 1/ Development, analysis, and evaluation of convolutional and recurrent network speech recognition systems 2/ Development of end-to-end speech recognitions, including the development of novel algorithms for windowed attention |
Start Year | 2018 |
Description | Quorate |
Organisation | Quorate Technology |
Country | United Kingdom |
Sector | Private |
PI Contribution | We are developing models and algorithms for raw-waveform based speech recognition with the aim of significantly improving robustness to acoustic conditions. |
Collaborator Contribution | Quorate has a state-of-the-art product for multi-genre media transcription, and we are working with them to explore the use of the approaches developed in the project in the context of broadcast speech recognition. Quorate are currently jointly supporting a PhD student at Edinburgh, in the area of robust transcription of broadcast speech, and there are strong synergies between that project and SpeechWave. |
Impact | 1/ Development, analysis, and evaluation of convolutional and recurrent network speech recognition systems 2/ Development of end-to-end speech recognitions, including the development of novel algorithms for windowed attention |
Start Year | 2018 |
Description | SRI |
Organisation | SRI International (inc) |
Country | United States |
Sector | Charity/Non Profit |
PI Contribution | We are developing models and algorithms for raw-waveform based speech recognition with the aim of significantly improving robustness to acoustic conditions. |
Collaborator Contribution | SRI is concerned with the development of robust speech recognition within the DARPA RATS program, and this provides a platform for the evaluation of the technology developed in this project. |
Impact | 1/ Development, analysis, and evaluation of convolutional and recurrent network speech recognition systems 2/ Development of end-to-end speech recognition systems, including the development of novel algorithms for windowed attention |
Start Year | 2018 |