ExTOL: End to End Translation of British Sign Language

Lead Research Organisation: University of Surrey
Department Name: Vision Speech and Signal Proc CVSSP

Abstract

British Sign Language (BSL) is the natural language of the British Deaf community and is as rich and expressive as any spoken language. However, BSL is not just English words converted into hand motions. It is a language in its own right, with its own grammar, very different from English. Also BSL uses different elements of the body simultaneously. Not just the movement and shape of the hands but the body, face, mouth and the space around the signer are all used to convey meaning.

Linguistic study of sign languages is quite new compared to spoken languages, having begun only in the 1960s. Linguists are very interested in sign languages because of what they can reveal about the possibilities of human language that don't rely at all on sound. One of the problems is that studying sign languages involves analysing video footage - and because sign languages lack any standard writing or transcription system, this is extremely labour-intensive. This project will develop computer vision tools to assist with video analysis. This will in turn help linguists increase their knowledge of the language with a long term ambition of creating the world's first machine readable dataset of a sign language, a goal that was achieved for large amounts of text of spoken language in the 1970s.

The ultimate goal of this project is to take the annotated data and understanding from linguistic study and to use this to build a system that is capable of watching a human signing and turning this into written English. This will be a world first and an important landmark for deaf-hearing communication. To achieve this the computer must be able to recognise not only hand motion and shape but the facial expression and body posture of the signer. It must also understanding how these aspects are put together into phrases and how these can be translated into written/spoken language.

Although there have been some recent advances in sign language recognition via data gloves and motion capture systems like Kinect, part of the problem is that most computer scientists in this research area do not have the required in-depth knowledge of sign language. This project is therefore a strategic collaboration between leading experts in British Sign Language linguistics and software engineers who specialise in computer vision and machine learning, with the aim of building the world's first British Sign Language to English Translation system and the first practically functional machine translation system for any sign language.

Planned Impact

User beneficiaries of this project include those in the following groups:

Deaf people in society: Machine translation from sign language (SL) to written/spoken language will contribute to the status of deaf people in modern society & enhanced hearing-deaf communication, bringing SLs up to par with machine translation between spoken languages. It also meets the requirements of the UN Convention on the Rights of Persons with Disabilities (UNCRPD) which was ratified by the UK in 2009 & by the EU in 2011. The UNCRPD sets a framework for deaf people's rights, mentioning SL seven times in five different articles. Additionally, the research will take us one step closer towards achieving the first fully machine-readable SL corpora - a goal achieved for text corpora of spoken languages in the 1970s. This is important for deaf communities as validation of their linguistic/cultural heritage & enabling wider access to archives.

Education: SL teachers & their students will benefit from machine translation technology as it will provide new, faster ways of translating, annotating & manipulating videos that include SL. Also, it paves the way for automated analyses in the assessment of second language acquisition of SLs and/or non-verbal behaviour in spoken language.

Deaf Researchers: We will aim to attract deaf applicants to the research posts. Deaf people often do not see HE employment as a viable option due to communication challenges. This project will enable us to train & mentor more young deaf researchers, contributing to co-creation & capacity building. The project will lead to increased participation of deaf people which will be ensured in three ways: priority-setting in collaboration with the deaf community, capacity building through the training & employment of deaf researchers, & ensuring native SL skills of deaf researchers are used.

Researchers in linguistics & ICT: This project will be of benefit to linguists working on analysing visual language videos by providing tools to assist in a) faster annotation, given that slow annotation has precluded progress in SL corpus research, and b) richer annotation of visual language data than is currently feasible, especially concerning facial expression. This will benefit computer scientists working on recognition/synthesis of SL, gesture, multi-modal interaction, non-verbal communication, human-machine interaction, & affective computing. Additionally, low-level phonetic transcription of manual & non-manual features in face-to-face communication will contribute to better movement models needed for natural-looking synthesis.

Researchers in arts, social science & medicine: The project will benefit a wide group of researchers by providing tools for the analysis of video data of human interaction: those studying multi-modal communication including linguistics, psychology, sociology, economics, & education; those concerned with gesture studies & features of audio-visual interaction; researchers of online video & social media; those studying developmental & acquired language & communication impairments in spoken & signed languages, including studies of therapeutic discourse; anthropologists & ethnologists. More generally, the technology could also be used for studies of human movement beyond language & communication.

Commercial private sector: The tools from this project will be of interest to businesses in the area of computer vision as they will provide new marketable techniques & therefore new opportunities for revenue. Automated subtitling from SLs to meet accessibility requirements for broadcast video, video on social media etc are obvious areas but as highlighted above, the application areas go far beyond SL.

In summary, the strategic interdisciplinary partnership in this project between experts in linguistics & computer vision also has direct reciprocal benefits not only to those communities but also to social science, ICT and other fields more generally.
 
Description The project is developing tools to provide automatic translation of sign language
Exploitation Route The research is still ongoing but should have important implications for both sign linguistics and automatic translation.
Sectors Digital/Communication/Information Technologies (including Software)

URL https://cvssp.org/projects/extol/
 
Description (EASIER) - Intelligent Automatic Sign Language Translation
Amount € 3,991,591 (EUR)
Funding ID 101016982 
Organisation European Commission 
Sector Public
Country European Union (EU)
Start 01/2021 
End 12/2023
 
Description University of Oxford 
Organisation University of Oxford
Country United Kingdom 
Sector Academic/University 
PI Contribution Open collaboration on sign language recognition and translation
Collaborator Contribution Collaboration on sign language recognition and translation
Impact See awards outcomes
 
Title Watch, read and lookup: learning to spot signs from multiple supervisors 
Description The focus of this work is sign spotting-given a video of an isolated sign, our task is to identify whether and where it has been signed in a continuous, co-articulated sign language video. To achieve this sign spotting task, we train a model using multiple types of available supervision by: (1) watching existing sparsely labelled footage; (2) reading associated subtitles (readily available translations of the signed content) which provide additional weak-supervision; (3) looking up words (for which no co-articulated labelled examples are available) in visual sign language dictionaries to enable novel sign spotting. 
Type Of Technology Software 
Year Produced 2020 
Open Source License? Yes  
Impact "Watch, read and lookup: learning to spot signs from multiple supervisors" Liliane Momeni*,Gül Varol*,Samuel Albanie*,Triantafyllos Afouras,Andrew Zisserman Visual Geometry Group (VGG), University of Oxford Best Application Paper, ACCV 2020 
 
Description DCAL: New progress in sign-to-text technology 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact "DCAL: New progress in sign-to-text technology," featuring ExTOL project. Limping Chicken (world-leading deaf blog), 13 September 2018. https://limpingchicken.com/2018/09/13/dcal-new-progress-in-sign-to-text-technology/
Year(s) Of Engagement Activity 2018
URL https://limpingchicken.com/2018/09/13/dcal-new-progress-in-sign-to-text-technology/
 
Description SLRTP: Sign Language Recognition, Translation and Production Workshop 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact This workshop brings together researchers working on different aspects of vision-based sign language research (including body posture, hands and face) and sign language linguists. The aim is to increase the linguistic understanding of the computer vision community of the problems that need solving, to identify the strengths and limitations of current work and to cultivate future collaborations.
Year(s) Of Engagement Activity 2020
URL http://www.slrtp.com/
 
Description Sign Language Recognition, Translation & Production (SLRTP) Workshop at the European Conference on Computer Vision 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact The ExTol partners organized a Sign Language workshop at ECCV 2020.

This is the description of the workshop: The "Sign Language Recognition, Translation & Production" (SLRTP) Workshop brings together researchers working on different aspects of vision-based sign language research (including body posture, hands and face) and sign language linguists. The aims are to increase the linguistic understanding of sign languages within the computer vision community, and also to identify the strengths and limitations of current work and the problems that need solving. Finally, we hope that the workshop will cultivate future collaborations.

Recent developments in image captioning, visual question answering and visual dialogue have stimulated significant interest in approaches that fuse visual and linguistic modelling. As spatio-temporal linguistic constructs, sign languages represent a unique challenge where vision and language meet. Computer vision researchers have been studying sign languages in isolated recognition scenarios for the last three decades. However, now that large scale continuous corpora are beginning to become available, research has moved towards continuous sign language recognition. More recently, the new frontier has become sign language translation and production where new developments in generative models are enabling translation between spoken/written language and continuous sign language videos, and vice versa. In this workshop, we propose to bring together researchers to discuss the open challenges that lie at the intersection of sign language and computer vision.
Year(s) Of Engagement Activity 2020
URL https://slrtp.com/
 
Description Sign Language Recognition, Translation and Production workshop, part of ECCV 2020 conference 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact The "Sign Language Recognition, Translation & Production" (SLRTP) Workshop in August 2020 brought together researchers working on different aspects of vision-based sign language research (including body posture, hands and face) and sign language linguists. The aims were to increase the linguistic understanding of sign languages within the computer vision community, and also to identify the strengths and limitations of current work and the problems that need solving.
The event was co-organised by DCAL/UCL, University of Surrey and Oxford University.
Year(s) Of Engagement Activity 2020
URL https://www.slrtp.com