ExTOL: End to End Translation of British Sign Language
Lead Research Organisation:
University of Surrey
Department Name: Vision Speech and Signal Proc CVSSP
Abstract
British Sign Language (BSL) is the natural language of the British Deaf community and is as rich and expressive as any spoken language. However, BSL is not just English words converted into hand motions. It is a language in its own right, with its own grammar, very different from English. Also BSL uses different elements of the body simultaneously. Not just the movement and shape of the hands but the body, face, mouth and the space around the signer are all used to convey meaning.
Linguistic study of sign languages is quite new compared to spoken languages, having begun only in the 1960s. Linguists are very interested in sign languages because of what they can reveal about the possibilities of human language that don't rely at all on sound. One of the problems is that studying sign languages involves analysing video footage - and because sign languages lack any standard writing or transcription system, this is extremely labour-intensive. This project will develop computer vision tools to assist with video analysis. This will in turn help linguists increase their knowledge of the language with a long term ambition of creating the world's first machine readable dataset of a sign language, a goal that was achieved for large amounts of text of spoken language in the 1970s.
The ultimate goal of this project is to take the annotated data and understanding from linguistic study and to use this to build a system that is capable of watching a human signing and turning this into written English. This will be a world first and an important landmark for deaf-hearing communication. To achieve this the computer must be able to recognise not only hand motion and shape but the facial expression and body posture of the signer. It must also understanding how these aspects are put together into phrases and how these can be translated into written/spoken language.
Although there have been some recent advances in sign language recognition via data gloves and motion capture systems like Kinect, part of the problem is that most computer scientists in this research area do not have the required in-depth knowledge of sign language. This project is therefore a strategic collaboration between leading experts in British Sign Language linguistics and software engineers who specialise in computer vision and machine learning, with the aim of building the world's first British Sign Language to English Translation system and the first practically functional machine translation system for any sign language.
Linguistic study of sign languages is quite new compared to spoken languages, having begun only in the 1960s. Linguists are very interested in sign languages because of what they can reveal about the possibilities of human language that don't rely at all on sound. One of the problems is that studying sign languages involves analysing video footage - and because sign languages lack any standard writing or transcription system, this is extremely labour-intensive. This project will develop computer vision tools to assist with video analysis. This will in turn help linguists increase their knowledge of the language with a long term ambition of creating the world's first machine readable dataset of a sign language, a goal that was achieved for large amounts of text of spoken language in the 1970s.
The ultimate goal of this project is to take the annotated data and understanding from linguistic study and to use this to build a system that is capable of watching a human signing and turning this into written English. This will be a world first and an important landmark for deaf-hearing communication. To achieve this the computer must be able to recognise not only hand motion and shape but the facial expression and body posture of the signer. It must also understanding how these aspects are put together into phrases and how these can be translated into written/spoken language.
Although there have been some recent advances in sign language recognition via data gloves and motion capture systems like Kinect, part of the problem is that most computer scientists in this research area do not have the required in-depth knowledge of sign language. This project is therefore a strategic collaboration between leading experts in British Sign Language linguistics and software engineers who specialise in computer vision and machine learning, with the aim of building the world's first British Sign Language to English Translation system and the first practically functional machine translation system for any sign language.
Planned Impact
User beneficiaries of this project include those in the following groups:
Deaf people in society: Machine translation from sign language (SL) to written/spoken language will contribute to the status of deaf people in modern society & enhanced hearing-deaf communication, bringing SLs up to par with machine translation between spoken languages. It also meets the requirements of the UN Convention on the Rights of Persons with Disabilities (UNCRPD) which was ratified by the UK in 2009 & by the EU in 2011. The UNCRPD sets a framework for deaf people's rights, mentioning SL seven times in five different articles. Additionally, the research will take us one step closer towards achieving the first fully machine-readable SL corpora - a goal achieved for text corpora of spoken languages in the 1970s. This is important for deaf communities as validation of their linguistic/cultural heritage & enabling wider access to archives.
Education: SL teachers & their students will benefit from machine translation technology as it will provide new, faster ways of translating, annotating & manipulating videos that include SL. Also, it paves the way for automated analyses in the assessment of second language acquisition of SLs and/or non-verbal behaviour in spoken language.
Deaf Researchers: We will aim to attract deaf applicants to the research posts. Deaf people often do not see HE employment as a viable option due to communication challenges. This project will enable us to train & mentor more young deaf researchers, contributing to co-creation & capacity building. The project will lead to increased participation of deaf people which will be ensured in three ways: priority-setting in collaboration with the deaf community, capacity building through the training & employment of deaf researchers, & ensuring native SL skills of deaf researchers are used.
Researchers in linguistics & ICT: This project will be of benefit to linguists working on analysing visual language videos by providing tools to assist in a) faster annotation, given that slow annotation has precluded progress in SL corpus research, and b) richer annotation of visual language data than is currently feasible, especially concerning facial expression. This will benefit computer scientists working on recognition/synthesis of SL, gesture, multi-modal interaction, non-verbal communication, human-machine interaction, & affective computing. Additionally, low-level phonetic transcription of manual & non-manual features in face-to-face communication will contribute to better movement models needed for natural-looking synthesis.
Researchers in arts, social science & medicine: The project will benefit a wide group of researchers by providing tools for the analysis of video data of human interaction: those studying multi-modal communication including linguistics, psychology, sociology, economics, & education; those concerned with gesture studies & features of audio-visual interaction; researchers of online video & social media; those studying developmental & acquired language & communication impairments in spoken & signed languages, including studies of therapeutic discourse; anthropologists & ethnologists. More generally, the technology could also be used for studies of human movement beyond language & communication.
Commercial private sector: The tools from this project will be of interest to businesses in the area of computer vision as they will provide new marketable techniques & therefore new opportunities for revenue. Automated subtitling from SLs to meet accessibility requirements for broadcast video, video on social media etc are obvious areas but as highlighted above, the application areas go far beyond SL.
In summary, the strategic interdisciplinary partnership in this project between experts in linguistics & computer vision also has direct reciprocal benefits not only to those communities but also to social science, ICT and other fields more generally.
Deaf people in society: Machine translation from sign language (SL) to written/spoken language will contribute to the status of deaf people in modern society & enhanced hearing-deaf communication, bringing SLs up to par with machine translation between spoken languages. It also meets the requirements of the UN Convention on the Rights of Persons with Disabilities (UNCRPD) which was ratified by the UK in 2009 & by the EU in 2011. The UNCRPD sets a framework for deaf people's rights, mentioning SL seven times in five different articles. Additionally, the research will take us one step closer towards achieving the first fully machine-readable SL corpora - a goal achieved for text corpora of spoken languages in the 1970s. This is important for deaf communities as validation of their linguistic/cultural heritage & enabling wider access to archives.
Education: SL teachers & their students will benefit from machine translation technology as it will provide new, faster ways of translating, annotating & manipulating videos that include SL. Also, it paves the way for automated analyses in the assessment of second language acquisition of SLs and/or non-verbal behaviour in spoken language.
Deaf Researchers: We will aim to attract deaf applicants to the research posts. Deaf people often do not see HE employment as a viable option due to communication challenges. This project will enable us to train & mentor more young deaf researchers, contributing to co-creation & capacity building. The project will lead to increased participation of deaf people which will be ensured in three ways: priority-setting in collaboration with the deaf community, capacity building through the training & employment of deaf researchers, & ensuring native SL skills of deaf researchers are used.
Researchers in linguistics & ICT: This project will be of benefit to linguists working on analysing visual language videos by providing tools to assist in a) faster annotation, given that slow annotation has precluded progress in SL corpus research, and b) richer annotation of visual language data than is currently feasible, especially concerning facial expression. This will benefit computer scientists working on recognition/synthesis of SL, gesture, multi-modal interaction, non-verbal communication, human-machine interaction, & affective computing. Additionally, low-level phonetic transcription of manual & non-manual features in face-to-face communication will contribute to better movement models needed for natural-looking synthesis.
Researchers in arts, social science & medicine: The project will benefit a wide group of researchers by providing tools for the analysis of video data of human interaction: those studying multi-modal communication including linguistics, psychology, sociology, economics, & education; those concerned with gesture studies & features of audio-visual interaction; researchers of online video & social media; those studying developmental & acquired language & communication impairments in spoken & signed languages, including studies of therapeutic discourse; anthropologists & ethnologists. More generally, the technology could also be used for studies of human movement beyond language & communication.
Commercial private sector: The tools from this project will be of interest to businesses in the area of computer vision as they will provide new marketable techniques & therefore new opportunities for revenue. Automated subtitling from SLs to meet accessibility requirements for broadcast video, video on social media etc are obvious areas but as highlighted above, the application areas go far beyond SL.
In summary, the strategic interdisciplinary partnership in this project between experts in linguistics & computer vision also has direct reciprocal benefits not only to those communities but also to social science, ICT and other fields more generally.
Organisations
- University of Surrey (Lead Research Organisation)
- UNIVERSITY OF OXFORD (Collaboration)
- Inter College of Therapeutic Education (Project Partner)
- Universität Hamburg (Project Partner)
- European Union (Project Partner)
- Catholic (Radboud) University Foundation (Project Partner)
- British Broadcasting Corporation (United Kingdom) (Project Partner)
Publications
Albanie S
(2021)
SeeHear: Signer diarisation and a new dataset
Albanie S
(2021)
SeeHear: Signer Diarisation and a New Dataset
Albanie S.
(2021)
SEEHEAR: SIGNER DIARISATION AND A NEW DATASET
Camgoz N
(2021)
Content4All Open Research Sign Language Translation Datasets
Cihan Camgoz N
(2020)
Sign Language Transformers: Joint End-to-End Sign Language Recognition and Translation
Fox N
(2023)
Best practices for sign language technology research
in Universal Access in the Information Society
Ivashechkin M
(2023)
Improving 3D Pose Estimation For Sign Language
K R Prajwal
(2022)
Weakly-supervised Fingerspelling Recognition in British Sign Language Videos
Koller O
(2020)
Weakly Supervised Learning with Multi-Stream CNN-LSTM-HMMs to Discover Sequential Parallelism in Sign Language Videos.
in IEEE transactions on pattern analysis and machine intelligence
Liu Y.
(2020)
Use what you have: Video retrieval using representations from collaborative experts
in 30th British Machine Vision Conference 2019, BMVC 2019
Momeni L.
(2020)
Seeing wake words: Audio-Visual Keyword Spotting
Renz K.
(2021)
Sign Segmentation with Temporal Convolutional Networks
Rochette G
(2023)
Novel View Synthesis of Humans Using Differentiable Rendering
in IEEE Transactions on Biometrics, Behavior, and Identity Science
Saunders B
(2021)
Continuous 3D Multi-Channel Sign Language Production via Progressive Transformers and Mixture Density Networks
in International Journal of Computer Vision
Saunders B.
(2022)
Skeletal Graph Self-Attention: Embedding a Skeleton Inductive Bias into Sign Language Production
in 7th Workshop on Sign Language Translation and Avatar Technology: The Junction of the Visual and the Textual Challenges and Perspectives, SLTAT 2022 - as part of the International Conference on Language Resources and Evaluation, LREC 2022 - Proceedings
Schembri A
(2022)
Signed Language Corpora
Stoll S
(2020)
Text2Sign: Towards Sign Language Production Using Neural Machine Translation and Generative Adversarial Networks
in International Journal of Computer Vision
Varol G
(2022)
Scaling Up Sign Spotting Through Sign Language Dictionaries
in International Journal of Computer Vision
Varol G.
(2021)
Read and Attend: Temporal Localisation in Sign Language Videos
Vowels M
(2020)
NestedVAE: Isolating Common Factors via Weak Supervision
Walsh H
(2023)
Gloss Alignment using Word Embeddings
Walsh H.
(2022)
Changing the Representation: Examining Language Representation for Neural Sign Language Production
in 7th Workshop on Sign Language Translation and Avatar Technology: The Junction of the Visual and the Textual Challenges and Perspectives, SLTAT 2022 - as part of the International Conference on Language Resources and Evaluation, LREC 2022 - Proceedings
Woll B.
(2022)
Segmentation of Signs for Research Purposes: Comparing Humans and Machines
in 10th Workshop on the Representation and Processing of Sign Languages: Multilingual Sign Language Resources, sign-lang 2022 - held in conjunction with the International Conference on Language Resources and Evaluation, LREC 2022 - Proceedings
Description | The project is developing tools to provide automatic translation of sign language |
Exploitation Route | The research is still ongoing but should have important implications for both sign linguistics and automatic translation. |
Sectors | Digital/Communication/Information Technologies (including Software) |
URL | https://cvssp.org/projects/extol/ |
Description | IP transferred into new spin off venture company. Company incorporated Feb 22 with pre-seed funding. Currently closing its seed round. |
First Year Of Impact | 2022 |
Sector | Communities and Social Services/Policy,Digital/Communication/Information Technologies (including Software),Transport |
Impact Types | Cultural Societal Economic Policy & public services |
Description | (EASIER) - Intelligent Automatic Sign Language Translation |
Amount | € 3,991,591 (EUR) |
Funding ID | 101016982 |
Organisation | European Commission |
Sector | Public |
Country | European Union (EU) |
Start | 01/2021 |
End | 12/2023 |
Title | BOBSL: BBC-Oxford British Sign Language Dataset |
Description | BOBSL is a large-scale dataset of British Sign Language (BSL). It comprises 1,962 episodes (approximately 1,400 hours) of BSL-interpreted BBC broadcast footage accompanied by written English subtitles. From horror, period and medical dramas, history, nature and science documentaries, sitcoms, children's shows and programs covering cooking, beauty, business and travel, BOBSL covers a wide range of topics. The dataset features a total of 39 signers. Distinct signers appear in the training, validation and test sets for signer-independent evaluation. |
Type Of Material | Database/Collection of data |
Year Produced | 2020 |
Provided To Others? | No |
Impact | S. Albanie*, G. Varol*, L. Momeni*, H. Bull*, T. Afouras, H. Chowdhury, N. Fox, B. Woll, R. Cooper, A. McParland, A. Zisserman. BBC-Oxford British Sign Language Dataset S. Albanie*, G. Varol*, L. Momeni, T. Afouras, J.S. Chung, N. Fox, B. Woll, A. Zisserman. BSL-1K: Scaling up co-articulated sign language recognition using mouthing cues |
URL | https://www.robots.ox.ac.uk/~vgg/data/bobsl/ |
Description | University of Oxford |
Organisation | University of Oxford |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | Open collaboration on sign language recognition and translation |
Collaborator Contribution | Collaboration on sign language recognition and translation |
Impact | See awards outcomes |
Title | Manual Annotation Tools for Sign Language Annotation |
Description | The VIA suite of tools and List Annotator (LISA) tools are being used for manual annotation of several sign language video datasets. |
Type Of Technology | Software |
Year Produced | 2022 |
Open Source License? | Yes |
Impact | The software was used to prepare the BOBSL dataset for ExTol. |
Title | Watch, read and lookup: learning to spot signs from multiple supervisors |
Description | The focus of this work is sign spotting-given a video of an isolated sign, our task is to identify whether and where it has been signed in a continuous, co-articulated sign language video. To achieve this sign spotting task, we train a model using multiple types of available supervision by: (1) watching existing sparsely labelled footage; (2) reading associated subtitles (readily available translations of the signed content) which provide additional weak-supervision; (3) looking up words (for which no co-articulated labelled examples are available) in visual sign language dictionaries to enable novel sign spotting. |
Type Of Technology | Software |
Year Produced | 2020 |
Open Source License? | Yes |
Impact | "Watch, read and lookup: learning to spot signs from multiple supervisors" Liliane Momeni*,Gül Varol*,Samuel Albanie*,Triantafyllos Afouras,Andrew Zisserman Visual Geometry Group (VGG), University of Oxford Best Application Paper, ACCV 2020 |
Company Name | Signapse |
Description | Signapse develops artificial intelligence-powered sign language translation software that is designed to improve multimedia accessibility for the deaf community. |
Year Established | 2022 |
Impact | First use case is rolling out BSL translation to UK train stations |
Website | https://www.signapse.ai/ |
Description | DCAL: New progress in sign-to-text technology |
Form Of Engagement Activity | Engagement focused website, blog or social media channel |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Public/other audiences |
Results and Impact | "DCAL: New progress in sign-to-text technology," featuring ExTOL project. Limping Chicken (world-leading deaf blog), 13 September 2018. https://limpingchicken.com/2018/09/13/dcal-new-progress-in-sign-to-text-technology/ |
Year(s) Of Engagement Activity | 2018 |
URL | https://limpingchicken.com/2018/09/13/dcal-new-progress-in-sign-to-text-technology/ |
Description | SLRTP: Sign Language Recognition, Translation and Production Workshop |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | This workshop brings together researchers working on different aspects of vision-based sign language research (including body posture, hands and face) and sign language linguists. The aim is to increase the linguistic understanding of the computer vision community of the problems that need solving, to identify the strengths and limitations of current work and to cultivate future collaborations. |
Year(s) Of Engagement Activity | 2020 |
URL | http://www.slrtp.com/ |
Description | Sign Language Recognition, Translation & Production (SLRTP) Workshop at the European Conference on Computer Vision |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Postgraduate students |
Results and Impact | The ExTol partners organized a Sign Language workshop at ECCV 2020. This is the description of the workshop: The "Sign Language Recognition, Translation & Production" (SLRTP) Workshop brings together researchers working on different aspects of vision-based sign language research (including body posture, hands and face) and sign language linguists. The aims are to increase the linguistic understanding of sign languages within the computer vision community, and also to identify the strengths and limitations of current work and the problems that need solving. Finally, we hope that the workshop will cultivate future collaborations. Recent developments in image captioning, visual question answering and visual dialogue have stimulated significant interest in approaches that fuse visual and linguistic modelling. As spatio-temporal linguistic constructs, sign languages represent a unique challenge where vision and language meet. Computer vision researchers have been studying sign languages in isolated recognition scenarios for the last three decades. However, now that large scale continuous corpora are beginning to become available, research has moved towards continuous sign language recognition. More recently, the new frontier has become sign language translation and production where new developments in generative models are enabling translation between spoken/written language and continuous sign language videos, and vice versa. In this workshop, we propose to bring together researchers to discuss the open challenges that lie at the intersection of sign language and computer vision. |
Year(s) Of Engagement Activity | 2020 |
URL | https://slrtp.com/ |
Description | Sign Language Recognition, Translation & Production (SLRTP) Workshop at the European Conference on Computer Vision 2022 |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Postgraduate students |
Results and Impact | The ExTol partners organized a Sign Language workshop at ECCV 2022. This is the description of the workshop: The "Sign Language Recognition, Translation & Production" (SLRTP) Workshop brings together researchers working on different aspects of vision-based sign language research (including body posture, hands and face) and sign language linguists. The focus of this workshop is to broaden participation in sign language research from the computer vision community. We hope to identify important future research directions, and to cultivate collaborations. The workshop will consist of invited talks and also a challenge with three tracks: individual sign recognition; English sentence to sign sequence alignment; and sign spotting. |
Year(s) Of Engagement Activity | 2022 |
URL | https://slrtp-2022.github.io/ |
Description | Sign Language Recognition, Translation and Production workshop, part of ECCV 2020 conference |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Other audiences |
Results and Impact | The "Sign Language Recognition, Translation & Production" (SLRTP) Workshop in August 2020 brought together researchers working on different aspects of vision-based sign language research (including body posture, hands and face) and sign language linguists. The aims were to increase the linguistic understanding of sign languages within the computer vision community, and also to identify the strengths and limitations of current work and the problems that need solving. The event was co-organised by DCAL/UCL, University of Surrey and Oxford University. |
Year(s) Of Engagement Activity | 2020 |
URL | https://www.slrtp.com |