LILiR2 - Language Independent Lip Reading
Lead Research Organisation:
University of East Anglia
Department Name: Computing Sciences
Abstract
It is known that humans can, and do, lip-read but not much is known about exactly what visual information is needed for effective lip-reading, particularly in non-laboratory environments. This project will collect data for lip-reading and use it to build automatic lip-reading systems: machines that convert videos of lip-motions into text. To be effective such systems must accurately track the head over a variety of poses; extract numbers, or features, that describe the lips and then learn what features correspond to what text. To tackle the problem we will need to use information collected from audio speech. So this project will also investigate how to use the extensive information known about audio speech to recognise visual speech.The project is a collaboration between the University of East Anglia who have previously developed state-of-the-art speech reading systems; the University of Surrey who built accurate and reliable face and lip-trackers and the Home Office Scientific Branch who wish to investigate the feasibility of this approach for crime fighting.
Publications
Barry-John Theobald
(2010)
Limitations of visual speech recognition
Barry-John Theobald
(2008)
The challenge of multispeaker lip-reading
Barry-John Theobald
(2010)
In pursuit of visemes
Bear H
(2014)
Some observations on computer lip-reading: moving from the dream to the reality
in SPIE Proceedings 9253a: Optics and Photonics for Counterterrorsim, Crime Fighting and Defence
Bear H
(2015)
Finding phonemes: improving machine lip-reading
in Proceedings of the 1st Joint Conference on Facial Analysis, Animation and Auditory-Visual Speech Processing
Bear H
(2015)
Speaker-independent machine lip-reading with speaker-dependent viseme classifiers
in Proceedings of the 1st Joint Conference on Facial Analysis, Animation and Auditory-Visual Speech Processing
Bear H
(2014)
Resolution limits on visual speech recognition
in Proceedings of IEEE International Conference on Image Processing
Bear H
(2014)
Which phoneme-to-viseme maps best improve visual-only computer lip-reading?
in Advances in Visual Computing: Proceedings 10th International Symposium, ISVC 2014
Bowden R
(2012)
Is automated conversion of video to text a reality?
in Proceedings SPIE 8546, Optics and Photonics for Counterterrorism, Crime Fighting and Defence VIII
Bowden R
(2012)
Is automated conversion of video to text a reality?
Bowden R
(2013)
Recent developments in automated lip-reading
in Optics and Photonics for Counterterrorism and Crime Fighting IX
Hilder S
(2010)
In Pursuit of Visemes
in Proceedings on Auditory-Visual Speech Processing (AVSP)
Hilder S.
(2009)
Comparison of human and machine-based lip-reading
in Auditory-Visual Speech Processing 2009, AVSP 2009
Hilder S.
(2010)
In Pursuit of Visemes
in Auditory-Visual Speech Processing 2010, AVSP 2010
Lan Y
(2009)
Comparing visual features for lipreading
in Proceesings International Conference on Auditory-Visual Speech Processing
Lan Y
(2010)
Improving visual features for lip-reading
in Proceedings International Conference on Auditory-Visual Speech Processing
Lan Y
(2012)
Insights into machine lip reading
Lan Y
(2012)
Insights into machine lip reading
in 2012 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2012, Kyoto, Japan, March 25-30, 2012
Lan Y
(2012)
View Independent Computer Lip-Reading
Lan Y.
(2010)
Improving Visual Features for Lip-reading
in Auditory-Visual Speech Processing 2010, AVSP 2010
Newman J
(2010)
Speaker independent visual-only language identification
Newman J
(2012)
Language Identification Using Visual Features
in IEEE Transactions on Audio, Speech, and Language Processing
Newman J.L.
(2010)
Limitations of Visual Speech Recognition
in Auditory-Visual Speech Processing 2010, AVSP 2010
Thangthai K
(2015)
Improving Lip-reading performance for robust audiovisual speech recognition using dynamic neural networks
in Proceedings of the 1st Joint Conference on Facial Analysis, Animation and Auditory-Visual Speech Processing
Description | It is known that humans can, and do, lip-read but not much is known about exactly what visual information is needed for effective lip-reading. This project collected new datasets for lip-reading and used these to build automatic lip-reading systems: machines that convert videos of lip-motions into text. It also compared human performance against automatic performance on the same dataset. To be effective at automatic lip reading, systems must accurately track the head over a variety of poses; extract features, that describe the lips and then learn what features correspond to what text. To this end the project developed a state-of- the-art facial feature tracking system that could track any set of facial features on any person, in any environement in real- time. This tracking system has resulted in high quality international publications and interest from a variety of industrial sectors from government through to the post production industries. The project also developed several feature extraction/representations that could be used in the recognition of words and a general recognition framework for lip- reading. It made significant advances in person dependant recognition with accuracies approaching the level of speech recognition and it made significant new progress in the more challenging problem of person independent recognition i.e recognising people speaking who have never been seen by the system before. The project also developed approaches to language identification which allows the recognition of language to be performed just by the motion of the lips and the recognition of expression and non verbal communication, the subtle facial expressions that humans use intuitively to supplement the information provided by a speaker about subtle aspects of communication such as their interest in a topic of conversation. |
Exploitation Route | Our findings have been used as input to at least four subsequent grants on lip-reading. After much subsequent work (not funded by EPSRC) we have moved to more robust tracking, off-axis lip-reading, and some evidence of person-independence (the key problem not tackled in this grant). |
Sectors | Aerospace, Defence and Marine,Digital/Communication/Information Technologies (including Software),Healthcare,Government, Democracy and Justice,Retail,Security and Diplomacy |
Description | Our work is formed the basis for a number of public talks, press pieces and is currently under-going some proof-of-concept commercialisation. |
First Year Of Impact | 2010 |
Sector | Aerospace, Defence and Marine,Security and Diplomacy |
Impact Types | Cultural,Societal |
Description | Home Office |
Amount | £228,000 (GBP) |
Funding ID | 7020739 |
Organisation | Home Office |
Sector | Public |
Country | United Kingdom |
Start | 05/2010 |
End | 03/2013 |
Description | Home Office |
Amount | £2,580,000 (GBP) |
Organisation | Home Office |
Sector | Public |
Country | United Kingdom |
Start | 01/2011 |
End | 03/2013 |