LILiR2 - Language Independent Lip Reading

Lead Research Organisation: University of East Anglia

Department Name: Computing Sciences

Abstract

It is known that humans can, and do, lip-read but not much is known about exactly what visual information is needed for effective lip-reading, particularly in non-laboratory environments. This project will collect data for lip-reading and use it to build automatic lip-reading systems: machines that convert videos of lip-motions into text. To be effective such systems must accurately track the head over a variety of poses; extract numbers, or features, that describe the lips and then learn what features correspond to what text. To tackle the problem we will need to use information collected from audio speech. So this project will also investigate how to use the extensive information known about audio speech to recognise visual speech.The project is a collaboration between the University of East Anglia who have previously developed state-of-the-art speech reading systems; the University of Surrey who built accurate and reliable face and lip-trackers and the Home Office Scientific Branch who wish to investigate the feasibility of this approach for crime fighting.

Funded Value:

£391,814

Funded Period:

May 07 - Sep 10

Funder:

EPSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

EP/E028047/1

Principal Investigator:

Richard Harvey

Research Subject:

Info. & commun. Technol. (100%)

Research Topic:

Image & Vision Computing (100%)

Organisations

People	ORCID iD
Richard Harvey (Principal Investigator)
Barry-John Theobald (Co-Investigator)
Stephen Cox (Co-Investigator)

Publications

Author Name

Title Publication Date Published

|< < 1 2 > >|

10 25 50

Barry-John Theobald (2010) In pursuit of visemes

Barry-John Theobald (2008) The challenge of multispeaker lip-reading

Barry-John Theobald (2010) Limitations of visual speech recognition

Bear H (2014) Which phoneme-to-viseme maps best improve visual-only computer lip-reading? in Advances in Visual Computing: Proceedings 10th International Symposium, ISVC 2014

Bear H (2014) Some observations on computer lip-reading: moving from the dream to the reality in SPIE Proceedings 9253a: Optics and Photonics for Counterterrorsim, Crime Fighting and Defence

Bear H (2014) Resolution limits on visual speech recognition in Proceedings of IEEE International Conference on Image Processing

Bear H (2015) Speaker-independent machine lip-reading with speaker-dependent viseme classifiers in Proceedings of the 1st Joint Conference on Facial Analysis, Animation and Auditory-Visual Speech Processing

Bear H (2015) Finding phonemes: improving machine lip-reading in Proceedings of the 1st Joint Conference on Facial Analysis, Animation and Auditory-Visual Speech Processing

Bowden R (2013) Recent developments in automated lip-reading in Optics and Photonics for Counterterrorism and Crime Fighting IX

Bowden R (2012) Is automated conversion of video to text a reality? in Proceedings SPIE 8546, Optics and Photonics for Counterterrorism, Crime Fighting and Defence VIII

Key Findings
Impact Summary
Further Funding


Description	It is known that humans can, and do, lip-read but not much is known about exactly what visual information is needed for effective lip-reading. This project collected new datasets for lip-reading and used these to build automatic lip-reading systems: machines that convert videos of lip-motions into text. It also compared human performance against automatic performance on the same dataset. To be effective at automatic lip reading, systems must accurately track the head over a variety of poses; extract features, that describe the lips and then learn what features correspond to what text. To this end the project developed a state-of- the-art facial feature tracking system that could track any set of facial features on any person, in any environement in real- time. This tracking system has resulted in high quality international publications and interest from a variety of industrial sectors from government through to the post production industries. The project also developed several feature extraction/representations that could be used in the recognition of words and a general recognition framework for lip- reading. It made significant advances in person dependant recognition with accuracies approaching the level of speech recognition and it made significant new progress in the more challenging problem of person independent recognition i.e recognising people speaking who have never been seen by the system before. The project also developed approaches to language identification which allows the recognition of language to be performed just by the motion of the lips and the recognition of expression and non verbal communication, the subtle facial expressions that humans use intuitively to supplement the information provided by a speaker about subtle aspects of communication such as their interest in a topic of conversation.
Exploitation Route	Our findings have been used as input to at least four subsequent grants on lip-reading. After much subsequent work (not funded by EPSRC) we have moved to more robust tracking, off-axis lip-reading, and some evidence of person-independence (the key problem not tackled in this grant).
Sectors	Aerospace, Defence and Marine,Digital/Communication/Information Technologies (including Software),Healthcare,Government, Democracy and Justice,Retail,Security and Diplomacy


Description	Our work is formed the basis for a number of public talks, press pieces and is currently under-going some proof-of-concept commercialisation.
First Year Of Impact	2010
Sector	Aerospace, Defence and Marine,Security and Diplomacy
Impact Types	Cultural,Societal


Description	Home Office
Amount	£2,580,000 (GBP)
Organisation	Home Office
Sector	Public
Country	United Kingdom
Start	01/2011
End	03/2013


Description	Home Office
Amount	£228,000 (GBP)
Funding ID	7020739
Organisation	Home Office
Sector	Public
Country	United Kingdom
Start	06/2010
End	03/2013

Abstract

Organisations

People

ORCID iD

Publications