Unmute: Opening Spoken Language Interaction to the Currently Unheard

Lead Research Organisation: University of Edinburgh

Department Name: Centre for Speech Technology Research

Abstract

Everyday, millions of users talk to machines, be they voice assistants on their mobile phones or standalone devices - such as Alexa or Google Home - in the homes and workplaces. By using Natural Language Processing for input and output through the medium of speech in an intelligent user interface, people are able to access a plethora of content and services. These systems rely on a several factors: i) firstly the use of automatic speech recognition (ASR) that can convert the speech signal into usable language tokens; ii) the availability of knowledge and resources in the system to address the user need; and, iii) an intelligent user interface interaction design that fits the contexts and capabilities of the end user. While state-of-the-art systems are beginning to address the needs of "conventional" users (i.e. those who speak a widely spoken and written language; and who have relatively high degrees of literacy, exposure to digital interactions and other resources), there are many hundreds millions of people who are being excluded globally. Paradoxically, these users who have resource constraints (such as low digital and textual literacy) could be the ones to most benefit from advances in spoken NLP systems opening up economic, social and educational possibilities currently unmet.

This project addresses the limitations of today's approaches to open up intelligent interfaces to the currently digitally 'unheard'. The challenges we will address are threefold. Firstly, and crucially, there is a need to explore highly innovative ASR techniques that can cope with languages that have limited or even no textual resources. Conventionally, ASR systems rely on vast amounts of transcribed speech to develop and train models. Our focus is on languages where there is little of this data and indeed on languages where there is no established written form. For the systems to be useful to the sorts of community we target, they have to of course make available relevant content that is in the language these users use; so the second challenge is establish infrastructures and user involvement to provide ways to generate such content. In doing so we hope to produce a blueprint and toolkit that can be used by many other low or zero-resource language communities. Finally, the user communities we will work with to develop these new approaches have a different perspective to "conventional" users and the third challenge is to surface the needs and values when interacting with an intelligent interface for content and services so that the underlying algorithms and the interaction devices and styles are appropriate and effective. Prior work by our team has shown that assumptions of what works in speech assistants that are deployed in conventional settings break down when these systems are exposed to groups in informal settlements in India and townships in South Africa. By taking this approach, we expect to innovate on and disrupt the interface styles and interaction devices currently used in intelligent speech and language interfaces, addressing the need of not just currently excluded users but offering up new possibilities for the rest of the world too.

The work brings together two world leading groups in a new collaboration - Edinburgh's CSTR with its long track record on speech technology innovation; and, the FIT Lab at Swansea's computational foundry that has pioneered interaction innovation for and with emergent users for over a decade, developing and advocating responsible innovation with communities in rural, peri-urban and urban developing world contexts. These groups are joined by both existing and new collaborations with NGO, spin-out and international academic stakeholders who will help shape the work and ensure it has direct and sustainable impact.

Planned Impact

We are focussed on providing a voice to hundreds of millions of people who are currently 'unheard'. This exclusion comes in two forms: i) pragmatically, they cannot use their voice to speak to intelligent assistants because the ASR/NLP and information interaction technology does not accommodate the low or zero resource language they speak nor addresses their context of use; and, ii) more philosophically, they are not involved in the 'future making' of new technologies. That is, most technology is "designed in California" (as you can see if you turn over many mobile devices, to see this brand and boast). As a core element of the responsible innovation in this project, we are aiming to produce technology that these excluded users can benefit from and involve them directly in the design of not just their future but the wider world's as we uncover non-Californian perspectives and possibilities.

Voice assistants and related technology may offer a highly appropriate and effective way for the currently unheard to create and participate in socially and economically beneficial information interaction. From education, to health, to entrepreneurial services and advertising, the use of the platforms created in this project could lead to sustained and profound impacts. We would expect - and during the lifetime of the project will endeavour to develop - strong interest in the innovations from NGOs and Governmental organisation who wish to engage digitally excluded groups. Commercial opportunities also are likely as companies - including ones in the UK - aim to provide services that go beyond the conventional consumer base that is looks to be saturated within the next 5 years (a similar trend has been seen, for example, in smartphone deployment with companies highly focussed now on how to reach "the next billion" given the slow down in market growth amongst "developed" world economies).

By acting as a beacon for highly impactful, ambitious and human-centred innovation, the work will act as an attractor of talent to UK. The country already has a strong reputation in AI/language processing and bringing this together with clearly and purposefully driven human methods and interface technologies will help contribute to setting the UK apart. There is a growing disquiet globally about what the coming world of intelligent interfaces, AI, big data etc will do to people's jobs, society cohesion and even identity. This project will showcase ways that by working directly with end-users we can drive adventurous science that amplifies human capabilities and abilities rather than reducing them or making them redundant. Our work will then also contribute to the important and timely debate on how to responsibly, ethically and positively use advances in machine learning and AI.

Funded Value:

£970,668

Funded Period:

Dec 20 - Jul 24

Funder:

EPSRC

Project Status:

Active

Project Category:

Research Grant

Project Reference:

EP/T024976/1

Principal Investigator:

Peter Bell

Research Subject:

Info. & commun. Technol. (75%)

Linguistics (25%)

Research Topic:

Artificial Intelligence (25%)

Computational Linguistics (25%)

Human-Computer Interactions (50%)

Organisations

People	ORCID iD
Peter Bell (Principal Investigator)	http://orcid.org/0000-0002-9597-9615
Sharon Goldwater (Co-Investigator)
Matt Jones (Co-Investigator)
Jennifer Pearson (Co-Investigator)
Simon Robinson (Co-Investigator)	http://orcid.org/0000-0001-9228-006X
Steve Renals (Co-Investigator)

Publications

Author Name

Title Publication Date Published

10 25 50

Klejch O (2021) The CSTR System for Multilingual and Code-Switching ASR Challenges for Low Resource Indian Languages

Klejch O (2022) Deciphering Speech: a Zero-Resource Approach to Cross-Lingual Transfer in ASR

Klejch O (2021) Deciphering Speech: a Zero-Resource Approach to Cross-Lingual Transfer in ASR

Markl N (2023) Automatic Transcription and (De)Standardisation

Pearson J (2022) Can't Touch This: Rethinking Public Technology in a COVID-19 Era

Reitmaier T (2023) Situating Automatic Speech Recognition Development within Communities of Under-heard Language Speakers

Reitmaier T (2022) Opportunities and Challenges of Automatic Speech Recognition Systems for Low-Resource Language Speakers

Wallington E (2021) On the Learning Dynamics of Semi-Supervised Training for ASR

Software and Technical Products
Engagement Activities


Title	The UnMute Toolkit
Description	The UnMute Toolkit is a collection of tools, methodologies and pipelines tailored to minority language technology design and development. It contains components to engage community members, collect spoken language recordings, train information retrieval models and deploy those models in community contexts.
Type Of Technology	Software
Year Produced	2024
Open Source License?	Yes
Impact	The toolkit has helped gather data, train models and quickly deploy speech-driven services in community contexts.
URL	https://unmute.tech/toolkit/


Description	UnMute toolkit launch, IIT Guwahati
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	51 people from a mix of industry, academia and third-sector attended a day-long workshop to learn about the UnMute toolkit. Attendees reported a positive change in views, and were highly complimentary about the event and the toolkit itself.
Year(s) Of Engagement Activity	2024
URL	https://unmute.tech/toolkit-event/

Abstract

Planned Impact

Organisations

People

ORCID iD

Publications