📣 Help Shape the Future of UKRI's Gateway to Research (GtR)

We're improving UKRI's Gateway to Research and are seeking your input! If you would be interested in being interviewed about the improvements we're making and to have your say about how we can make GtR more user-friendly, impactful, and effective for the Research and Innovation community, please email gateway@ukri.org.

An AI-based Visualisation Feedback System for Speech Training

Lead Research Organisation: Durham University
Department Name: Computer Science

Abstract

Effective individualised feedback provided to language learners enables them to control their learning experience and allows them to practice at their own pace. However, in the language learning domain, learners often require additional training or intervention by humans to be able to interpret the feedback. One objective is to provide detailed real-time feedback for public speaking training purposes. The system will therefore focus on to what extent they should imitate native speakers or be themselves in order to engage the audience. Later we will adapt the system to support learners with special educational needs, especially those with difficulties conveying emotion.
Emotion classification in speech is a challenging task as emotion usually only changes subtly between sentences, and it is difficult to classify sentences as individual emotions. To expand the current system, reinforcement learning techniques will be used for emotion recognition and feedback and allow experts to manually correct the automatic feedback given by the model and to update the agent's policy. Moreover, we will use state of the art graph neural networks (GNNs) to classify emotions which can be trained using a self-supervised method in which there is no labelling or limited labelling. Relationships between speech segments can be represented using graphs and can be used for conversational emotion analysis. Further manipulation of the speakers' audio to adapt not only the emotional tone but also pitch accent and pronunciation will be considered. Furthermore, the detection of confidence and uncertainty in speech and the combination of speech and gesture analysis will be studied. Feedback will also be provided to improve the learners writing skills.
In Summary, a method has been proposed where for the first time, public speaking training is supported by detailed feedback provided on a visual dashboard including not only the transcription and pitch information, but also emotion information. Learners can use transcription to identify pronunciation issues and view how their pitch and emotion vary throughout their speech to more effectively improve their speaking skills. Emotion classification will be provided using state of the art reinforcement learning networks and GNNs which incorporate manual feedback from multiple experts to achieve a high classification accuracy.

People

ORCID iD

Adam Wynn (Student)

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/W524426/1 30/09/2022 29/09/2028
2717184 Studentship EP/W524426/1 30/09/2022 30/03/2026 Adam Wynn