Unmute: Opening Spoken Language Interaction to the Currently Unheard

Lead Research Organisation: University of Edinburgh
Department Name: Centre for Speech Technology Research

Abstract

Everyday, millions of users talk to machines, be they voice assistants on their mobile phones or standalone devices - such as Alexa or Google Home - in the homes and workplaces. By using Natural Language Processing for input and output through the medium of speech in an intelligent user interface, people are able to access a plethora of content and services. These systems rely on a several factors: i) firstly the use of automatic speech recognition (ASR) that can convert the speech signal into usable language tokens; ii) the availability of knowledge and resources in the system to address the user need; and, iii) an intelligent user interface interaction design that fits the contexts and capabilities of the end user. While state-of-the-art systems are beginning to address the needs of "conventional" users (i.e. those who speak a widely spoken and written language; and who have relatively high degrees of literacy, exposure to digital interactions and other resources), there are many hundreds millions of people who are being excluded globally. Paradoxically, these users who have resource constraints (such as low digital and textual literacy) could be the ones to most benefit from advances in spoken NLP systems opening up economic, social and educational possibilities currently unmet.

This project addresses the limitations of today's approaches to open up intelligent interfaces to the currently digitally 'unheard'. The challenges we will address are threefold. Firstly, and crucially, there is a need to explore highly innovative ASR techniques that can cope with languages that have limited or even no textual resources. Conventionally, ASR systems rely on vast amounts of transcribed speech to develop and train models. Our focus is on languages where there is little of this data and indeed on languages where there is no established written form. For the systems to be useful to the sorts of community we target, they have to of course make available relevant content that is in the language these users use; so the second challenge is establish infrastructures and user involvement to provide ways to generate such content. In doing so we hope to produce a blueprint and toolkit that can be used by many other low or zero-resource language communities. Finally, the user communities we will work with to develop these new approaches have a different perspective to "conventional" users and the third challenge is to surface the needs and values when interacting with an intelligent interface for content and services so that the underlying algorithms and the interaction devices and styles are appropriate and effective. Prior work by our team has shown that assumptions of what works in speech assistants that are deployed in conventional settings break down when these systems are exposed to groups in informal settlements in India and townships in South Africa. By taking this approach, we expect to innovate on and disrupt the interface styles and interaction devices currently used in intelligent speech and language interfaces, addressing the need of not just currently excluded users but offering up new possibilities for the rest of the world too.

The work brings together two world leading groups in a new collaboration - Edinburgh's CSTR with its long track record on speech technology innovation; and, the FIT Lab at Swansea's computational foundry that has pioneered interaction innovation for and with emergent users for over a decade, developing and advocating responsible innovation with communities in rural, peri-urban and urban developing world contexts. These groups are joined by both existing and new collaborations with NGO, spin-out and international academic stakeholders who will help shape the work and ensure it has direct and sustainable impact.

Planned Impact

We are focussed on providing a voice to hundreds of millions of people who are currently 'unheard'. This exclusion comes in two forms: i) pragmatically, they cannot use their voice to speak to intelligent assistants because the ASR/NLP and information interaction technology does not accommodate the low or zero resource language they speak nor addresses their context of use; and, ii) more philosophically, they are not involved in the 'future making' of new technologies. That is, most technology is "designed in California" (as you can see if you turn over many mobile devices, to see this brand and boast). As a core element of the responsible innovation in this project, we are aiming to produce technology that these excluded users can benefit from and involve them directly in the design of not just their future but the wider world's as we uncover non-Californian perspectives and possibilities.

Voice assistants and related technology may offer a highly appropriate and effective way for the currently unheard to create and participate in socially and economically beneficial information interaction. From education, to health, to entrepreneurial services and advertising, the use of the platforms created in this project could lead to sustained and profound impacts. We would expect - and during the lifetime of the project will endeavour to develop - strong interest in the innovations from NGOs and Governmental organisation who wish to engage digitally excluded groups. Commercial opportunities also are likely as companies - including ones in the UK - aim to provide services that go beyond the conventional consumer base that is looks to be saturated within the next 5 years (a similar trend has been seen, for example, in smartphone deployment with companies highly focussed now on how to reach "the next billion" given the slow down in market growth amongst "developed" world economies).

By acting as a beacon for highly impactful, ambitious and human-centred innovation, the work will act as an attractor of talent to UK. The country already has a strong reputation in AI/language processing and bringing this together with clearly and purposefully driven human methods and interface technologies will help contribute to setting the UK apart. There is a growing disquiet globally about what the coming world of intelligent interfaces, AI, big data etc will do to people's jobs, society cohesion and even identity. This project will showcase ways that by working directly with end-users we can drive adventurous science that amplifies human capabilities and abilities rather than reducing them or making them redundant. Our work will then also contribute to the important and timely debate on how to responsibly, ethically and positively use advances in machine learning and AI.