ExTOL: End to End Translation of British Sign Language

Lead Research Organisation: University of Surrey

Department Name: Vision Speech and Signal Proc CVSSP

Abstract

British Sign Language (BSL) is the natural language of the British Deaf community and is as rich and expressive as any spoken language. However, BSL is not just English words converted into hand motions. It is a language in its own right, with its own grammar, very different from English. Also BSL uses different elements of the body simultaneously. Not just the movement and shape of the hands but the body, face, mouth and the space around the signer are all used to convey meaning.

Linguistic study of sign languages is quite new compared to spoken languages, having begun only in the 1960s. Linguists are very interested in sign languages because of what they can reveal about the possibilities of human language that don't rely at all on sound. One of the problems is that studying sign languages involves analysing video footage - and because sign languages lack any standard writing or transcription system, this is extremely labour-intensive. This project will develop computer vision tools to assist with video analysis. This will in turn help linguists increase their knowledge of the language with a long term ambition of creating the world's first machine readable dataset of a sign language, a goal that was achieved for large amounts of text of spoken language in the 1970s.

The ultimate goal of this project is to take the annotated data and understanding from linguistic study and to use this to build a system that is capable of watching a human signing and turning this into written English. This will be a world first and an important landmark for deaf-hearing communication. To achieve this the computer must be able to recognise not only hand motion and shape but the facial expression and body posture of the signer. It must also understanding how these aspects are put together into phrases and how these can be translated into written/spoken language.

Although there have been some recent advances in sign language recognition via data gloves and motion capture systems like Kinect, part of the problem is that most computer scientists in this research area do not have the required in-depth knowledge of sign language. This project is therefore a strategic collaboration between leading experts in British Sign Language linguistics and software engineers who specialise in computer vision and machine learning, with the aim of building the world's first British Sign Language to English Translation system and the first practically functional machine translation system for any sign language.

Planned Impact

User beneficiaries of this project include those in the following groups:

Deaf people in society: Machine translation from sign language (SL) to written/spoken language will contribute to the status of deaf people in modern society & enhanced hearing-deaf communication, bringing SLs up to par with machine translation between spoken languages. It also meets the requirements of the UN Convention on the Rights of Persons with Disabilities (UNCRPD) which was ratified by the UK in 2009 & by the EU in 2011. The UNCRPD sets a framework for deaf people's rights, mentioning SL seven times in five different articles. Additionally, the research will take us one step closer towards achieving the first fully machine-readable SL corpora - a goal achieved for text corpora of spoken languages in the 1970s. This is important for deaf communities as validation of their linguistic/cultural heritage & enabling wider access to archives.

Education: SL teachers & their students will benefit from machine translation technology as it will provide new, faster ways of translating, annotating & manipulating videos that include SL. Also, it paves the way for automated analyses in the assessment of second language acquisition of SLs and/or non-verbal behaviour in spoken language.

Deaf Researchers: We will aim to attract deaf applicants to the research posts. Deaf people often do not see HE employment as a viable option due to communication challenges. This project will enable us to train & mentor more young deaf researchers, contributing to co-creation & capacity building. The project will lead to increased participation of deaf people which will be ensured in three ways: priority-setting in collaboration with the deaf community, capacity building through the training & employment of deaf researchers, & ensuring native SL skills of deaf researchers are used.

Researchers in linguistics & ICT: This project will be of benefit to linguists working on analysing visual language videos by providing tools to assist in a) faster annotation, given that slow annotation has precluded progress in SL corpus research, and b) richer annotation of visual language data than is currently feasible, especially concerning facial expression. This will benefit computer scientists working on recognition/synthesis of SL, gesture, multi-modal interaction, non-verbal communication, human-machine interaction, & affective computing. Additionally, low-level phonetic transcription of manual & non-manual features in face-to-face communication will contribute to better movement models needed for natural-looking synthesis.

Researchers in arts, social science & medicine: The project will benefit a wide group of researchers by providing tools for the analysis of video data of human interaction: those studying multi-modal communication including linguistics, psychology, sociology, economics, & education; those concerned with gesture studies & features of audio-visual interaction; researchers of online video & social media; those studying developmental & acquired language & communication impairments in spoken & signed languages, including studies of therapeutic discourse; anthropologists & ethnologists. More generally, the technology could also be used for studies of human movement beyond language & communication.

Commercial private sector: The tools from this project will be of interest to businesses in the area of computer vision as they will provide new marketable techniques & therefore new opportunities for revenue. Automated subtitling from SLs to meet accessibility requirements for broadcast video, video on social media etc are obvious areas but as highlighted above, the application areas go far beyond SL.

In summary, the strategic interdisciplinary partnership in this project between experts in linguistics & computer vision also has direct reciprocal benefits not only to those communities but also to social science, ICT and other fields more generally.

Funded Value:

£971,921

Funded Period:

Jun 18 - Jun 22

Funder:

EPSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

EP/R03298X/1

Principal Investigator:

Richard Bowden

Research Subject:

Info. & commun. Technol. (95%)

Linguistics (5%)

Research Topic:

Artificial Intelligence (5%)

Computational Linguistics (5%)

Image & Vision Computing (90%)

Organisations

People	ORCID iD
Richard Bowden (Principal Investigator)
Kearsy Cormier (Co-Investigator)
Bencie Woll (Co-Investigator)
Andrew Zisserman (Co-Investigator)

Publications

Author Name

Title Publication Date Published

10 25 50

Albanie S (2021) SeeHear: Signer diarisation and a new dataset

Albanie S (2021) SeeHear: Signer Diarisation and a New Dataset

Albanie S (2020) Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part XI

Albanie S. (2020) BSL-1K: Scaling up Co-articulated Sign Language Recognition Using Mouthing Cues

Albanie S. (2021) SEEHEAR: SIGNER DIARISATION AND A NEW DATASET

Camgoz N (2020) Computer Vision - ECCV 2020 Workshops - Glasgow, UK, August 23-28, 2020, Proceedings, Part IV

Camgoz N (2021) Content4All Open Research Sign Language Translation Datasets

Camgoz, NC (2020) SLRTP 2020: The Sign Language Recognition, Translation & Production Workshop

Camgöz N (2020) Computer Vision - ECCV 2020 Workshops - Glasgow, UK, August 23-28, 2020, Proceedings, Part II

Cihan Camgoz N (2020) Sign Language Transformers: Joint End-to-End Sign Language Recognition and Translation

Cormier K (2019) ExTOL: Automatic recognition of nonmanual features in the BSL Corpus (poster)

Cormier K (2019) ExTOL: Automatic recognition of British Sign Language using the BSL Corpus (poster)

Fox N (2023) Best practices for sign language technology research in Universal Access in the Information Society

Ivashechkin M (2023) Improving 3D Pose Estimation For Sign Language

Jiang T (2021) Looking for the Signs: Identifying Isolated Sign Instances in Continuous Video Footage

Jiang T (2021) Skeletor: Skeletal Transformers for Robust Body-Pose Estimation

K R Prajwal (2022) Weakly-supervised Fingerspelling Recognition in British Sign Language Videos

Koller O (2020) Weakly Supervised Learning with Multi-Stream CNN-LSTM-HMMs to Discover Sequential Parallelism in Sign Language Videos. in IEEE transactions on pattern analysis and machine intelligence

L Momeni (2022) Automatic dense annotation of large-vocabulary sign language videos

Liu Y (2019) Use What You Have: Video Retrieval Using Representations From Collaborative Experts

Liu Y (2020) Use what you have: Video retrieval using representations from collaborative experts

Liu Y. (2020) Use what you have: Video retrieval using representations from collaborative experts in 30th British Machine Vision Conference 2019, BMVC 2019

Momeni L. (2020) Watch, Read and Lookup: Learning to Spot Signs from Multiple Supervisors

Momeni L. (2020) Seeing wake words: Audio-Visual Keyword Spotting

Renz K (2021) Sign Language Segmentation with Temporal Convolutional Networks

Renz K. (2021) Sign Segmentation with Temporal Convolutional Networks

Rochette G (2021) Human Pose Manipulation and Novel View Synthesis using Differentiable Rendering

Rochette G (2023) Novel View Synthesis of Humans Using Differentiable Rendering in IEEE Transactions on Biometrics, Behavior, and Identity Science

Saunders B (2021) Mixed SIGNals: Sign Language Production via a Mixture of Motion Primitives

Saunders B (2022) Signing at Scale: Learning to Co-Articulate Signs for Large-Scale Photo-Realistic Sign Language Production

Saunders B (2021) Continuous 3D Multi-Channel Sign Language Production via Progressive Transformers and Mixture Density Networks in International Journal of Computer Vision

Saunders B (2020) Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part XI

Saunders B (2021) Anonysign: Novel Human Appearance Synthesis for Sign Language Video Anonymisation

Saunders B. (2022) Skeletal Graph Self-Attention: Embedding a Skeleton Inductive Bias into Sign Language Production in 7th Workshop on Sign Language Translation and Avatar Technology: The Junction of the Visual and the Textual Challenges and Perspectives, SLTAT 2022 - as part of the International Conference on Language Resources and Evaluation, LREC 2022 - Proceedings

Schembri A (2022) Signed Language Corpora

Stoll S (2020) Text2Sign: Towards Sign Language Production Using Neural Machine Translation and Generative Adversarial Networks in International Journal of Computer Vision

Stoll S (2020) Computer Vision - ECCV 2020 Workshops - Glasgow, UK, August 23-28, 2020, Proceedings, Part IV

Varol G (2022) Scaling Up Sign Spotting Through Sign Language Dictionaries in International Journal of Computer Vision

Varol G. (2021) Read and Attend: Temporal Localisation in Sign Language Videos

Vowels M (2020) NestedVAE: Isolating Common Factors via Weak Supervision

Vowels M (2020) Gated Variational AutoEncoders: Incorporating Weak Supervision to Encourage Disentanglement

Vowels M (2021) Targeted VAE: Variational and Targeted Learning for Causal Inference

Walsh H (2023) Gloss Alignment using Word Embeddings

Walsh H. (2022) Changing the Representation: Examining Language Representation for Neural Sign Language Production in 7th Workshop on Sign Language Translation and Avatar Technology: The Junction of the Visual and the Textual Challenges and Perspectives, SLTAT 2022 - as part of the International Conference on Language Resources and Evaluation, LREC 2022 - Proceedings

Woll B (2021) The Routledge Handbook of Theoretical and Experimental Sign Language Research

Woll B (2022) Proceedings of the 10th Workshop on the representation and processing of sign languages: Multilingual Sign Language Resources [workshop part of International Conference on Language Resources and Evaluation, LREC 2022]

Woll B. (2022) Segmentation of Signs for Research Purposes: Comparing Humans and Machines in 10th Workshop on the Representation and Processing of Sign Languages: Multilingual Sign Language Resources, sign-lang 2022 - held in conjunction with the International Conference on Language Resources and Evaluation, LREC 2022 - Proceedings

Wong R (2023) Computer Vision - ECCV 2022 Workshops - Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part VIII

Key Findings
Impact Summary
Further Funding
Research Databases and Models
Collaboration
Software and Technical Products
Spin Outs
Engagement Activities


Description	The project is developing tools to provide automatic translation of sign language
Exploitation Route	The research is still ongoing but should have important implications for both sign linguistics and automatic translation.
Sectors	Digital/Communication/Information Technologies (including Software)
URL	https://cvssp.org/projects/extol/


Description	IP transferred into new spin off venture company. Company incorporated Feb 22 with pre-seed funding. Currently closing its seed round.
First Year Of Impact	2022
Sector	Communities and Social Services/Policy,Digital/Communication/Information Technologies (including Software),Transport
Impact Types	Cultural Societal Economic Policy & public services


Description	(EASIER) - Intelligent Automatic Sign Language Translation
Amount	€ 3,991,591 (EUR)
Funding ID	101016982
Organisation	European Commission
Sector	Public
Country	European Union (EU)
Start	01/2021
End	12/2023


Title	BOBSL: BBC-Oxford British Sign Language Dataset
Description	BOBSL is a large-scale dataset of British Sign Language (BSL). It comprises 1,962 episodes (approximately 1,400 hours) of BSL-interpreted BBC broadcast footage accompanied by written English subtitles. From horror, period and medical dramas, history, nature and science documentaries, sitcoms, children's shows and programs covering cooking, beauty, business and travel, BOBSL covers a wide range of topics. The dataset features a total of 39 signers. Distinct signers appear in the training, validation and test sets for signer-independent evaluation.
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	No
Impact	S. Albanie, G. Varol, L. Momeni, H. Bull, T. Afouras, H. Chowdhury, N. Fox, B. Woll, R. Cooper, A. McParland, A. Zisserman. BBC-Oxford British Sign Language Dataset S. Albanie, G. Varol, L. Momeni, T. Afouras, J.S. Chung, N. Fox, B. Woll, A. Zisserman. BSL-1K: Scaling up co-articulated sign language recognition using mouthing cues
URL	https://www.robots.ox.ac.uk/~vgg/data/bobsl/


Description	University of Oxford
Organisation	University of Oxford
Country	United Kingdom
Sector	Academic/University
PI Contribution	Open collaboration on sign language recognition and translation
Collaborator Contribution	Collaboration on sign language recognition and translation
Impact	See awards outcomes


Title	Manual Annotation Tools for Sign Language Annotation
Description	The VIA suite of tools and List Annotator (LISA) tools are being used for manual annotation of several sign language video datasets.
Type Of Technology	Software
Year Produced	2022
Open Source License?	Yes
Impact	The software was used to prepare the BOBSL dataset for ExTol.


Title	Watch, read and lookup: learning to spot signs from multiple supervisors
Description	The focus of this work is sign spotting-given a video of an isolated sign, our task is to identify whether and where it has been signed in a continuous, co-articulated sign language video. To achieve this sign spotting task, we train a model using multiple types of available supervision by: (1) watching existing sparsely labelled footage; (2) reading associated subtitles (readily available translations of the signed content) which provide additional weak-supervision; (3) looking up words (for which no co-articulated labelled examples are available) in visual sign language dictionaries to enable novel sign spotting.
Type Of Technology	Software
Year Produced	2020
Open Source License?	Yes
Impact	"Watch, read and lookup: learning to spot signs from multiple supervisors" Liliane Momeni,Gül Varol,Samuel Albanie*,Triantafyllos Afouras,Andrew Zisserman Visual Geometry Group (VGG), University of Oxford Best Application Paper, ACCV 2020


Company Name	Signapse
Description	Signapse develops artificial intelligence-powered sign language translation software that is designed to improve multimedia accessibility for the deaf community.
Year Established	2022
Impact	First use case is rolling out BSL translation to UK train stations
Website	https://www.signapse.ai/


Description	DCAL: New progress in sign-to-text technology
Form Of Engagement Activity	Engagement focused website, blog or social media channel
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Public/other audiences
Results and Impact	"DCAL: New progress in sign-to-text technology," featuring ExTOL project. Limping Chicken (world-leading deaf blog), 13 September 2018. https://limpingchicken.com/2018/09/13/dcal-new-progress-in-sign-to-text-technology/
Year(s) Of Engagement Activity	2018
URL	https://limpingchicken.com/2018/09/13/dcal-new-progress-in-sign-to-text-technology/


Description	SLRTP: Sign Language Recognition, Translation and Production Workshop
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	This workshop brings together researchers working on different aspects of vision-based sign language research (including body posture, hands and face) and sign language linguists. The aim is to increase the linguistic understanding of the computer vision community of the problems that need solving, to identify the strengths and limitations of current work and to cultivate future collaborations.
Year(s) Of Engagement Activity	2020
URL	http://www.slrtp.com/


Description	Sign Language Recognition, Translation & Production (SLRTP) Workshop at the European Conference on Computer Vision
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	The ExTol partners organized a Sign Language workshop at ECCV 2020. This is the description of the workshop: The "Sign Language Recognition, Translation & Production" (SLRTP) Workshop brings together researchers working on different aspects of vision-based sign language research (including body posture, hands and face) and sign language linguists. The aims are to increase the linguistic understanding of sign languages within the computer vision community, and also to identify the strengths and limitations of current work and the problems that need solving. Finally, we hope that the workshop will cultivate future collaborations. Recent developments in image captioning, visual question answering and visual dialogue have stimulated significant interest in approaches that fuse visual and linguistic modelling. As spatio-temporal linguistic constructs, sign languages represent a unique challenge where vision and language meet. Computer vision researchers have been studying sign languages in isolated recognition scenarios for the last three decades. However, now that large scale continuous corpora are beginning to become available, research has moved towards continuous sign language recognition. More recently, the new frontier has become sign language translation and production where new developments in generative models are enabling translation between spoken/written language and continuous sign language videos, and vice versa. In this workshop, we propose to bring together researchers to discuss the open challenges that lie at the intersection of sign language and computer vision.
Year(s) Of Engagement Activity	2020
URL	https://slrtp.com/


Description	Sign Language Recognition, Translation & Production (SLRTP) Workshop at the European Conference on Computer Vision 2022
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	The ExTol partners organized a Sign Language workshop at ECCV 2022. This is the description of the workshop: The "Sign Language Recognition, Translation & Production" (SLRTP) Workshop brings together researchers working on different aspects of vision-based sign language research (including body posture, hands and face) and sign language linguists. The focus of this workshop is to broaden participation in sign language research from the computer vision community. We hope to identify important future research directions, and to cultivate collaborations. The workshop will consist of invited talks and also a challenge with three tracks: individual sign recognition; English sentence to sign sequence alignment; and sign spotting.
Year(s) Of Engagement Activity	2022
URL	https://slrtp-2022.github.io/


Description	Sign Language Recognition, Translation and Production workshop, part of ECCV 2020 conference
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Other audiences
Results and Impact	The "Sign Language Recognition, Translation & Production" (SLRTP) Workshop in August 2020 brought together researchers working on different aspects of vision-based sign language research (including body posture, hands and face) and sign language linguists. The aims were to increase the linguistic understanding of sign languages within the computer vision community, and also to identify the strengths and limitations of current work and the problems that need solving. The event was co-organised by DCAL/UCL, University of Surrey and Oxford University.
Year(s) Of Engagement Activity	2020
URL	https://www.slrtp.com

Abstract

Planned Impact

Organisations

People

ORCID iD

Publications