ExTOL: End to End Translation of British Sign Language

Lead Research Organisation: University of Surrey
Department Name: Vision Speech and Signal Proc CVSSP

Abstract

British Sign Language (BSL) is the natural language of the British Deaf community and is as rich and expressive as any spoken language. However, BSL is not just English words converted into hand motions. It is a language in its own right, with its own grammar, very different from English. Also BSL uses different elements of the body simultaneously. Not just the movement and shape of the hands but the body, face, mouth and the space around the signer are all used to convey meaning.

Linguistic study of sign languages is quite new compared to spoken languages, having begun only in the 1960s. Linguists are very interested in sign languages because of what they can reveal about the possibilities of human language that don't rely at all on sound. One of the problems is that studying sign languages involves analysing video footage - and because sign languages lack any standard writing or transcription system, this is extremely labour-intensive. This project will develop computer vision tools to assist with video analysis. This will in turn help linguists increase their knowledge of the language with a long term ambition of creating the world's first machine readable dataset of a sign language, a goal that was achieved for large amounts of text of spoken language in the 1970s.

The ultimate goal of this project is to take the annotated data and understanding from linguistic study and to use this to build a system that is capable of watching a human signing and turning this into written English. This will be a world first and an important landmark for deaf-hearing communication. To achieve this the computer must be able to recognise not only hand motion and shape but the facial expression and body posture of the signer. It must also understanding how these aspects are put together into phrases and how these can be translated into written/spoken language.

Although there have been some recent advances in sign language recognition via data gloves and motion capture systems like Kinect, part of the problem is that most computer scientists in this research area do not have the required in-depth knowledge of sign language. This project is therefore a strategic collaboration between leading experts in British Sign Language linguistics and software engineers who specialise in computer vision and machine learning, with the aim of building the world's first British Sign Language to English Translation system and the first practically functional machine translation system for any sign language.

Planned Impact

User beneficiaries of this project include those in the following groups:

Deaf people in society: Machine translation from sign language (SL) to written/spoken language will contribute to the status of deaf people in modern society & enhanced hearing-deaf communication, bringing SLs up to par with machine translation between spoken languages. It also meets the requirements of the UN Convention on the Rights of Persons with Disabilities (UNCRPD) which was ratified by the UK in 2009 & by the EU in 2011. The UNCRPD sets a framework for deaf people's rights, mentioning SL seven times in five different articles. Additionally, the research will take us one step closer towards achieving the first fully machine-readable SL corpora - a goal achieved for text corpora of spoken languages in the 1970s. This is important for deaf communities as validation of their linguistic/cultural heritage & enabling wider access to archives.

Education: SL teachers & their students will benefit from machine translation technology as it will provide new, faster ways of translating, annotating & manipulating videos that include SL. Also, it paves the way for automated analyses in the assessment of second language acquisition of SLs and/or non-verbal behaviour in spoken language.

Deaf Researchers: We will aim to attract deaf applicants to the research posts. Deaf people often do not see HE employment as a viable option due to communication challenges. This project will enable us to train & mentor more young deaf researchers, contributing to co-creation & capacity building. The project will lead to increased participation of deaf people which will be ensured in three ways: priority-setting in collaboration with the deaf community, capacity building through the training & employment of deaf researchers, & ensuring native SL skills of deaf researchers are used.

Researchers in linguistics & ICT: This project will be of benefit to linguists working on analysing visual language videos by providing tools to assist in a) faster annotation, given that slow annotation has precluded progress in SL corpus research, and b) richer annotation of visual language data than is currently feasible, especially concerning facial expression. This will benefit computer scientists working on recognition/synthesis of SL, gesture, multi-modal interaction, non-verbal communication, human-machine interaction, & affective computing. Additionally, low-level phonetic transcription of manual & non-manual features in face-to-face communication will contribute to better movement models needed for natural-looking synthesis.

Researchers in arts, social science & medicine: The project will benefit a wide group of researchers by providing tools for the analysis of video data of human interaction: those studying multi-modal communication including linguistics, psychology, sociology, economics, & education; those concerned with gesture studies & features of audio-visual interaction; researchers of online video & social media; those studying developmental & acquired language & communication impairments in spoken & signed languages, including studies of therapeutic discourse; anthropologists & ethnologists. More generally, the technology could also be used for studies of human movement beyond language & communication.

Commercial private sector: The tools from this project will be of interest to businesses in the area of computer vision as they will provide new marketable techniques & therefore new opportunities for revenue. Automated subtitling from SLs to meet accessibility requirements for broadcast video, video on social media etc are obvious areas but as highlighted above, the application areas go far beyond SL.

In summary, the strategic interdisciplinary partnership in this project between experts in linguistics & computer vision also has direct reciprocal benefits not only to those communities but also to social science, ICT and other fields more generally.

Publications

10 25 50