LILiR2 - Language Independent Lip Reading

Lead Research Organisation: University of East Anglia
Department Name: Computing Sciences

Abstract

It is known that humans can, and do, lip-read but not much is known about exactly what visual information is needed for effective lip-reading, particularly in non-laboratory environments. This project will collect data for lip-reading and use it to build automatic lip-reading systems: machines that convert videos of lip-motions into text. To be effective such systems must accurately track the head over a variety of poses; extract numbers, or features, that describe the lips and then learn what features correspond to what text. To tackle the problem we will need to use information collected from audio speech. So this project will also investigate how to use the extensive information known about audio speech to recognise visual speech.The project is a collaboration between the University of East Anglia who have previously developed state-of-the-art speech reading systems; the University of Surrey who built accurate and reliable face and lip-trackers and the Home Office Scientific Branch who wish to investigate the feasibility of this approach for crime fighting.

Publications

10 25 50
publication icon
Barry-John Theobald (2010) In pursuit of visemes

publication icon
Barry-John Theobald (2008) The challenge of multispeaker lip-reading

publication icon
Barry-John Theobald (2010) Limitations of visual speech recognition

publication icon
Bear H (2014) Which phoneme-to-viseme maps best improve visual-only computer lip-reading? in Advances in Visual Computing: Proceedings 10th International Symposium, ISVC 2014

publication icon
Bear H (2014) Some observations on computer lip-reading: moving from the dream to the reality in SPIE Proceedings 9253a: Optics and Photonics for Counterterrorsim, Crime Fighting and Defence

publication icon
Bear H (2014) Resolution limits on visual speech recognition in Proceedings of IEEE International Conference on Image Processing

publication icon
Bear H (2015) Speaker-independent machine lip-reading with speaker-dependent viseme classifiers in Proceedings of the 1st Joint Conference on Facial Analysis, Animation and Auditory-Visual Speech Processing

publication icon
Bear H (2015) Finding phonemes: improving machine lip-reading in Proceedings of the 1st Joint Conference on Facial Analysis, Animation and Auditory-Visual Speech Processing

publication icon
Bowden R (2013) Recent developments in automated lip-reading in Optics and Photonics for Counterterrorism and Crime Fighting IX

publication icon
Bowden R (2012) Is automated conversion of video to text a reality? in Proceedings SPIE 8546, Optics and Photonics for Counterterrorism, Crime Fighting and Defence VIII

 
Description It is known that humans can, and do, lip-read but not much is known about exactly what visual information is needed for effective lip-reading. This project collected new datasets for lip-reading and used these to build automatic lip-reading systems: machines that convert videos of lip-motions into text. It also compared human performance against automatic performance on the same dataset.



To be effective at automatic lip reading, systems must accurately track the head over a variety of poses; extract features, that describe the lips and then learn what features correspond to what text. To this end the project developed a state-of- the-art facial feature tracking system that could track any set of facial features on any person, in any environement in real- time. This tracking system has resulted in high quality international publications and interest from a variety of industrial sectors from government through to the post production industries. The project also developed several feature extraction/representations that could be used in the recognition of words and a general recognition framework for lip- reading. It made significant advances in person dependant recognition with accuracies approaching the level of speech recognition and it made significant new progress in the more challenging problem of person independent recognition i.e recognising people speaking who have never been seen by the system before.



The project also developed approaches to language identification which allows the recognition of language to be performed just by the motion of the lips and the recognition of expression and non verbal communication, the subtle facial expressions that humans use intuitively to supplement the information provided by a speaker about subtle aspects of communication such as their interest in a topic of conversation.
Exploitation Route Our findings have been used as input to at least four subsequent grants on lip-reading.

After much subsequent work (not funded by EPSRC) we have moved to more robust tracking, off-axis lip-reading, and some evidence of person-independence (the key problem not tackled in this grant).
Sectors Aerospace, Defence and Marine,Digital/Communication/Information Technologies (including Software),Healthcare,Government, Democracy and Justice,Retail,Security and Diplomacy

 
Description Our work is formed the basis for a number of public talks, press pieces and is currently under-going some proof-of-concept commercialisation.
First Year Of Impact 2010
Sector Aerospace, Defence and Marine,Security and Diplomacy
Impact Types Cultural,Societal

 
Description Home Office
Amount £2,580,000 (GBP)
Organisation Home Office 
Sector Public
Country United Kingdom
Start 01/2011 
End 03/2013
 
Description Home Office
Amount £228,000 (GBP)
Funding ID 7020739 
Organisation Home Office 
Sector Public
Country United Kingdom
Start 06/2010 
End 03/2013