LILiR2 - Language Independent Lip Reading

Lead Research Organisation: University of East Anglia

Department Name: Computing Sciences

Abstract

It is known that humans can, and do, lip-read but not much is known about exactly what visual information is needed for effective lip-reading, particularly in non-laboratory environments. This project will collect data for lip-reading and use it to build automatic lip-reading systems: machines that convert videos of lip-motions into text. To be effective such systems must accurately track the head over a variety of poses; extract numbers, or features, that describe the lips and then learn what features correspond to what text. To tackle the problem we will need to use information collected from audio speech. So this project will also investigate how to use the extensive information known about audio speech to recognise visual speech.The project is a collaboration between the University of East Anglia who have previously developed state-of-the-art speech reading systems; the University of Surrey who built accurate and reliable face and lip-trackers and the Home Office Scientific Branch who wish to investigate the feasibility of this approach for crime fighting.

Funded Value:

£391,814

Funded Period:

May 07 - Sep 10

Funder:

EPSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

EP/E028047/1

Principal Investigator:

Richard Harvey

Research Subject:

Info. & commun. Technol. (100%)

Research Topic:

Image & Vision Computing (100%)

Organisations

People	ORCID iD
Richard Harvey (Principal Investigator)
Barry-John Theobald (Co-Investigator)
Stephen Cox (Co-Investigator)

Publications

Author Name

Title Publication Date Published

10 25 50

Barry-John Theobald (2010) Limitations of visual speech recognition

Barry-John Theobald (2008) The challenge of multispeaker lip-reading

Barry-John Theobald (2010) In pursuit of visemes

Bear H (2014) Some observations on computer lip-reading: moving from the dream to the reality in SPIE Proceedings 9253a: Optics and Photonics for Counterterrorsim, Crime Fighting and Defence

Bear H (2015) Finding phonemes: improving machine lip-reading in Proceedings of the 1st Joint Conference on Facial Analysis, Animation and Auditory-Visual Speech Processing

Bear H (2015) Speaker-independent machine lip-reading with speaker-dependent viseme classifiers in Proceedings of the 1st Joint Conference on Facial Analysis, Animation and Auditory-Visual Speech Processing

Bear H (2014) Resolution limits on visual speech recognition in Proceedings of IEEE International Conference on Image Processing

Bear H (2014) Which phoneme-to-viseme maps best improve visual-only computer lip-reading? in Advances in Visual Computing: Proceedings 10th International Symposium, ISVC 2014

Bowden R (2012) Is automated conversion of video to text a reality? in Proceedings SPIE 8546, Optics and Photonics for Counterterrorism, Crime Fighting and Defence VIII

Bowden R (2012) Is automated conversion of video to text a reality?

Bowden R (2013) Recent developments in automated lip-reading in Optics and Photonics for Counterterrorism and Crime Fighting IX

Hilder S (2010) In Pursuit of Visemes in Proceedings on Auditory-Visual Speech Processing (AVSP)

Hilder S. (2009) Comparison of human and machine-based lip-reading in Auditory-Visual Speech Processing 2009, AVSP 2009

Hilder S. (2010) In Pursuit of Visemes in Auditory-Visual Speech Processing 2010, AVSP 2010

Lan Y (2009) Comparing visual features for lipreading in Proceesings International Conference on Auditory-Visual Speech Processing

Lan Y (2010) Improving visual features for lip-reading in Proceedings International Conference on Auditory-Visual Speech Processing

Lan Y (2012) Insights into machine lip reading

Lan Y (2012) Insights into machine lip reading in 2012 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2012, Kyoto, Japan, March 25-30, 2012

Lan Y (2012) View Independent Computer Lip-Reading

Lan Y. (2010) Improving Visual Features for Lip-reading in Auditory-Visual Speech Processing 2010, AVSP 2010

Newman J (2010) Speaker independent visual-only language identification

Newman J (2012) Language Identification Using Visual Features in IEEE Transactions on Audio, Speech, and Language Processing

Newman J.L. (2010) Limitations of Visual Speech Recognition in Auditory-Visual Speech Processing 2010, AVSP 2010

Ong E (2009) Robust facial feature tracking using selected multi-resolution linear predictors

Thangthai K (2015) Improving Lip-reading performance for robust audiovisual speech recognition using dynamic neural networks in Proceedings of the 1st Joint Conference on Facial Analysis, Animation and Auditory-Visual Speech Processing

Key Findings
Impact Summary
Further Funding


Description	It is known that humans can, and do, lip-read but not much is known about exactly what visual information is needed for effective lip-reading. This project collected new datasets for lip-reading and used these to build automatic lip-reading systems: machines that convert videos of lip-motions into text. It also compared human performance against automatic performance on the same dataset. To be effective at automatic lip reading, systems must accurately track the head over a variety of poses; extract features, that describe the lips and then learn what features correspond to what text. To this end the project developed a state-of- the-art facial feature tracking system that could track any set of facial features on any person, in any environement in real- time. This tracking system has resulted in high quality international publications and interest from a variety of industrial sectors from government through to the post production industries. The project also developed several feature extraction/representations that could be used in the recognition of words and a general recognition framework for lip- reading. It made significant advances in person dependant recognition with accuracies approaching the level of speech recognition and it made significant new progress in the more challenging problem of person independent recognition i.e recognising people speaking who have never been seen by the system before. The project also developed approaches to language identification which allows the recognition of language to be performed just by the motion of the lips and the recognition of expression and non verbal communication, the subtle facial expressions that humans use intuitively to supplement the information provided by a speaker about subtle aspects of communication such as their interest in a topic of conversation.
Exploitation Route	Our findings have been used as input to at least four subsequent grants on lip-reading. After much subsequent work (not funded by EPSRC) we have moved to more robust tracking, off-axis lip-reading, and some evidence of person-independence (the key problem not tackled in this grant).
Sectors	Aerospace, Defence and Marine,Digital/Communication/Information Technologies (including Software),Healthcare,Government, Democracy and Justice,Retail,Security and Diplomacy


Description	Our work is formed the basis for a number of public talks, press pieces and is currently under-going some proof-of-concept commercialisation.
First Year Of Impact	2010
Sector	Aerospace, Defence and Marine,Security and Diplomacy
Impact Types	Cultural,Societal


Description	Home Office
Amount	£228,000 (GBP)
Funding ID	7020739
Organisation	Home Office
Sector	Public
Country	United Kingdom
Start	05/2010
End	03/2013


Description	Home Office
Amount	£2,580,000 (GBP)
Organisation	Home Office
Sector	Public
Country	United Kingdom
Start	01/2011
End	03/2013

Abstract

Organisations

People

ORCID iD

Publications