SpeechWave

Lead Research Organisation: University of Edinburgh

Department Name: Sch of Informatics

Abstract

Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.

Funded Value:

£667,991

Funded Period:

Mar 18 - May 22

Funder:

EPSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

EP/R012180/1

Principal Investigator:

Steve Renals

Research Subject:

Info. & commun. Technol. (100%)

Research Topic:

Artificial Intelligence (20%)

Human Communication in ICT (80%)

Organisations

People	ORCID iD
Steve Renals (Principal Investigator)
Peter Bell (Co-Investigator)	http://orcid.org/0000-0002-9597-9615

Publications

Author Name Title Publication Date Published

|< < 1 2 3 > >|

10 25 50

Zhang S (2019) Windowed Attention Mechanisms for Speech Recognition

Loweimi E (2019) On Learning Interpretable CNNs with Parametric Modulated Kernel-Based Filters

Zhang S (2019) Trainable Dynamic Subsampling for End-to-End Speech Recognition

Fainberg J (2019) Acoustic Model Adaptation from Raw Waveforms with Sincnet

Loweimi E (2019) On the Usefulness of Statistical Normalisation of Bottleneck Features for Speech Recognition

Joy N (2020) Deep Scattering Power Spectrum Features for Robust Speech Recognition

Loweimi E (2020) Raw Sign and Magnitude Spectra for Multi-Head Acoustic Modelling

Bell P (2020) Adaptation Algorithms for Neural Network-Based Speech Recognition: An Overview

Zhang S (2020) When Can Self-Attention Be Replaced by Feed Forward Layers?

Key Findings
Impact Summary
Further Funding
Collaboration


Description	The main objectives of the project wre to explore approaches to speech recognition using the raw waveform, and to develop a deeper theoretical understanding of such approaches. The key findings were: 1/ Development of state-of-the-art baseline systems for waveform based speech recognition using the SincNet architecture which enable signal processing algorithms to be learned from from data 2/ Development of a windowed attention model for end to end speech recognition 3/ Theoretical analysis on the statistical normalisation of bottleneck features for speech recognition 4/ Development of an automatic adaptation approach for waveform-based speech recognition, which demonstrated the ability to adapt a system trained on adult speech to successfully recognise children's speech, using a limited amount of child data. 5/ Detailed theoretical and experimental investigation of different learnable filters in waveform-based speech recognition. 6/ Development of a dynamic subsampling approach for end-to-end speech recognition, enabling the model to skip redundant data. 7/ Development of a novel non-contrastive self-supervised approach for speech representation learning from raw waveforms 8/ Extension of raw-waveform methods to speech emotion recognition 9/ Development of interpretable raw waveform models based on convolution architectures 10/ Comprehensive phonetic error analysis for speech recognition
Exploitation Route	Our systems are being released as open source software and may be applied to speech recognition problems. Since the grant began, raw waveform systems have now become commonplace and are accepted as a standard approach in both academia and industry. The most popular example is the "wav2vec" family of models that have been open-sourced by Facebook/Meta.
Sectors	Aerospace Defence and Marine Digital/Communication/Information Technologies (including Software) Healthcare Culture Heritage Museums and Collections


Description	There were impacts in the following areas: 1. Broadcast and Media (via project partners BBC and Quorate). The focus of this work was to develop robust media transcription prototypes, able to cope with the diverse range of broadcast media. Media transcription has direct benefits (for example supporting accessibility through automatic subtitling), as well as enabling intelligent processing of broadcast media through natural language processing and text analytics. We directly provided speech recognition models to the BBC for use on a number of languages important to them, including Russian, Persian and Ukrainian. Quorate's speech technology was used on extended trial basis by Hansard, the UK's parliamentary record, for the purposes of recording and automatically transcribing proceedings in the Houses of Parliament. Quorate, an Edinburgh University spinout, was acquired by London Stock Exchange Group (LSEG) in 2021. We have recently begun a partnership with LSEG to further improve their speech technology, building on the SpeechWave outcomes. 2. Distant Speech Recognition (via project partner Emotech). The focus of this work was to develop prototype software for speech recognition in personal robots. In the end, Emotech moved away from the robot domain, but entered into a collaboration with Huawei to provide Virtual Education services to rural areas across China. Huawei publicly cited Emotech's "advanced technology in voice and multi-modal AI" as being key to the collaboration, and the platform has been hailed by UNESCO as "very important technology" and the "key for human beings to learn new things and skills faster". As well as China, the platform has been sold for use in the Middle East, South America and South Africa.
First Year Of Impact	2019
Sector	Creative Economy,Digital/Communication/Information Technologies (including Software),Education,Government, Democracy and Justice
Impact Types	Cultural Economic Policy & public services


Description	Adapting end-to-end speech recognition systems (year 1)
Amount	£137,365 (GBP)
Organisation	Samsung
Sector	Private
Country	Korea, Republic of
Start	12/2018
End	11/2019


Description	Adapting end-to-end speech recognition systems (year 2)
Amount	£113,989 (GBP)
Organisation	Samsung
Sector	Private
Country	Korea, Republic of
Start	12/2019
End	11/2020


Description	BBC Data Science Partnership
Organisation	British Broadcasting Corporation (BBC)
Department	BBC Research & Development
Country	United Kingdom
Sector	Public
PI Contribution	Development of speech and language technology applied to broadcasting and media production
Collaborator Contribution	R&D work from BBC researchers; data sharing.
Impact	MGB Challenge iCASE studentships EPSRC SCRIPT Project
Start Year	2017


Description	Emotech
Organisation	EmoTech Ltd
Country	United Kingdom
Sector	Private
PI Contribution	We are developing models and algorithms for raw-waveform based speech recognition with the aim of significantly improving robustness to acoustic conditions.
Collaborator Contribution	We shall work with Emotech on evaluating our models and algorithms using data collected by Emotech and made available to the project researchers. Furthermore we plan to conduct experiments using Emotech's Olly platform, and to this end Emotech will donate two devices to the project along with the required software development platform. Through the collaboration with Emotech we shall be able to evaluate the novel contributions pro- vided by SpeechWave, against the current state-of- the-art in realistic circumstances.
Impact	1/ Development, analysis, and evaluation of convolutional and recurrent network speech recognition systems 2/ Development of end-to-end speech recognitions, including the development of novel algorithms for windowed attention
Start Year	2018


Description	Quorate
Organisation	Quorate Technology
Country	United Kingdom
Sector	Private
PI Contribution	We are developing models and algorithms for raw-waveform based speech recognition with the aim of significantly improving robustness to acoustic conditions.
Collaborator Contribution	Quorate has a state-of-the-art product for multi-genre media transcription, and we are working with them to explore the use of the approaches developed in the project in the context of broadcast speech recognition. Quorate are currently jointly supporting a PhD student at Edinburgh, in the area of robust transcription of broadcast speech, and there are strong synergies between that project and SpeechWave.
Impact	1/ Development, analysis, and evaluation of convolutional and recurrent network speech recognition systems 2/ Development of end-to-end speech recognitions, including the development of novel algorithms for windowed attention
Start Year	2018


Description	SRI
Organisation	SRI International (inc)
Country	United States
Sector	Charity/Non Profit
PI Contribution	We are developing models and algorithms for raw-waveform based speech recognition with the aim of significantly improving robustness to acoustic conditions.
Collaborator Contribution	SRI is concerned with the development of robust speech recognition within the DARPA RATS program, and this provides a platform for the evaluation of the technology developed in this project.
Impact	1/ Development, analysis, and evaluation of convolutional and recurrent network speech recognition systems 2/ Development of end-to-end speech recognition systems, including the development of novel algorithms for windowed attention
Start Year	2018

Abstract

Organisations

People

ORCID iD

Publications