Audio-Visual Media Research Platform

Lead Research Organisation: University of Surrey
Department Name: Vision Speech and Signal Proc CVSSP

Abstract

The strategic objective of this platform grant is to underpin Audio-Visual Media Research within the Centre for Vision, Speech and Signal Processing (CVSSP) to pursue fundamental research combining internationally leading expertise in understanding of real-world audio and visual data, and to transfer this capability to impact new application domains. Our goal is to pioneer new technologies which impact directly on industry practice in healthcare, sports, retail, communication, entertainment and training. This builds on CVSSP's unique track-record of world-leading research in both audio and visual machine perception which has enabled ground-breaking technology exploited by UK industry. The strategic contribution and international standing of the centres research in audio and visual media has been recognised by EPSRC through two previous platform grant awards (2003-14) and two programme grant awards in 2013 and 2015. Platform Grant funding is requested to reinforce the critical mass of expertise and knowledge of specialist facilities required to contribute advance in both fundamental understanding and pioneering new technology. In particular this Platform Grant will catalyse advances in multi-sensory machine perception building on the Centre's unique strengths in audio and vision. Key experienced post-doctoral researchers have specialist knowledge and practical know-how, which is an important resource for training new researchers and for maintaining cutting edge research using state-of-the-art facilities. Strategically the Platform Grant will build on recent independent advances in audio and visual scene analysis to lead multi-sensory understanding and modelling of real-world scenes. Research advances will provide the foundation for UK industry to lead the development of technologies ranging from intelligent sensing for healthcare and assisted living to immersive entertainment production. Platform Grant funding will also strengthen CVSSP's international collaboration with leading groups world-wide through extended research secondments US (Washington, USC), Asia (Tsinghua, Tianjin, Kyoto, Tokyo, KAUST) and Europe (INRIA, MPI, Fraunhofer, ETH, EPFL, KTH, CTU, UPF).

Planned Impact

Platform Grant support for audio-visual media processing will support a critical-mass of joint research expertise in multi-sensory machine perception. This is critical towards achieving machines which can hear and see to understand and interact with real-world dynamic scenes. Recent advances in both the audio and vision research communities, with the introduction of deep learning methodologies, have achieved a step change in the capability of machine understanding enabling for the first time automatic interpretation of real-world complexity audio and visual data. Our research seeks to capitalise on these advances developing both the next generation of research leaders in multi-sensory machine perception and realising the capabilities for autonomous systems capable of combining sensing modalities to robustly understand and interpret audio-visual media. Research will address the open-challenge of machine understanding of complex dynamic real-world scenes combining the complementary information available from audio and visual sensor to achieve robust interpretation. These research advances are of central interest to both the audio and vision research communities and will bring together advances in machine perception. Joint audio-visual processing is essential to overcome the inherent ambiguities in either sensing modality such as occlusion, limited field of view and uniform appearance in visual sensing which commonly occur and can result in failure of visual understanding. Audio cues can overcome these limitations providing wide-area information and allowing the continuous sensing of objects which are visually obscured. For example both audio and visual cues are essential for non-contact monitoring of people in healthcare and assisted living applications.

Research in audio-visual signal processing and machine perception enabling robust interpretation of complex real-world dynamic scenes will have wide-spread impact on application domains through collaborative research with UK industry. These include:
- Medical diagnosis: to easily measure and monitor moving people from audio and video sensing for biomechanical analysis and to understand the relationship of body shape to health problems such as obesity and sleep patterns
- Sports performance analysis: to allow non-invasive performance analysis of player movement and team interaction in both training and live matches
- Veterinary science: to monitor animal behaviour for improved welfare and early detection of abnormalities
- Film, games and VR production: to support creativity through interpretation and editing of real-world content and to create digital doubles that appear and behave like a real person based on 4D audio-visual capture of actor performance
- Security and surveillance: to monitor individual human behaviour using visual biometrics based on 4D audio-visual analysis of the face and body in motion
- Robotics: to enable robots to safely navigate domestic environments with people and pets
- Human-computer communication: to understand individual behaviour for natural interaction
- Manufacturing industry: new methods for contactless measurement and analysis of surfaces using joint audio-visual sensing.
- Retail industry: methods to capture representations of both the audio and visual properties of objects to allow customers to experience these remotely.
- Cultural heritage: techniques to digitise artefacts for scientific study and to make them available to a wider audience.

Advances in these areas will be of direct benefit to the general public in healthcare, assisted living, entertainment and training.
Platform Grant funding will support underpinning research and pilot studies to build collaborative research in these domains.

Publications

10 25 50
publication icon
Mohammadi SM (2018) Sleep Posture Classification using a Convolutional Neural Network. in Conference proceedings : ... Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual Conference

publication icon
Kim H (2018) Multimodal Visual Data Registration for Web-Based Visualization in Media Production in IEEE Transactions on Circuits and Systems for Video Technology

publication icon
Mustafa A (2019) MSFD: Multi-Scale Segmentation-Based Feature Detection for Wide-Baseline Scene Reconstruction. in IEEE transactions on image processing : a publication of the IEEE Signal Processing Society

publication icon
Coleman P (2018) An Audio-Visual System for Object-Based Audio: From Recording to Listening in IEEE Transactions on Multimedia

publication icon
Remaggi L (2017) Acoustic Reflector Localization: Novel Image Source Reversion and Direct Localization Methods in IEEE/ACM Transactions on Audio, Speech, and Language Processing

publication icon
Gilbert A (2018) Fusing Visual and Inertial Sensors with Semantics for 3D Human Pose Estimation in International Journal of Computer Vision

publication icon
Francombe J (2018) Qualitative Evaluation of Media Device Orchestration for Immersive Spatial Audio Reproduction in Journal of the Audio Engineering Society

publication icon
Coleman P (2017) Object-Based Reverberation for Spatial Audio in Journal of the Audio Engineering Society

publication icon
Wan J (2018) Articulated motion and deformable objects in Pattern Recognition

 
Description Introduction of technologies for audio-visual scene understanding, understanding of human movement and behaviour, autonomous navigation through scene understanding and audio-visual search. These technologies are exploiting advances in machine learning for scene understanding from huge databases of audio-visual data. Novel learning methodologies and architectures have been introduced to overcome the limitations of previous approaches and enable operation in complex real-world settings.
Exploitation Route Research advances and technologies are actively being exploited across a wide range of sectors through further collaboration.
Sectors Aerospace, Defence and Marine,Agriculture, Food and Drink,Communities and Social Services/Policy,Creative Economy,Digital/Communication/Information Technologies (including Software),Education,Electronics,Environment,Financial Services, and Management Consultancy,Healthcare,Leisure Activities, including Sports, Recreation and Tourism,Government, Democracy and Justice,Manufacturing, including Industrial Biotechology,Culture, Heritage, Museums and Collections,Retail,Security and Diplomacy

URL http://www.surrey.ac.uk/cvssp
 
Description The platform grant is supporting research with impact across a wide range of sectors including healthcare, autonomous vehicles, entertainment, securing, manufacture and communications.
Sector Aerospace, Defence and Marine,Agriculture, Food and Drink,Communities and Social Services/Policy,Creative Economy,Digital/Communication/Information Technologies (including Software),Education,Electronics,Financial Services, and Management Consultancy,Healthcare,Leisure Activities, including Sports, Recreation and Tourism,Government, Democracy and Justice,Manufacturing, including Industrial Biotechology,Culture, Heritage, Museums and Collections,Retail,Security and Diplomacy,Transport
Impact Types Cultural,Societal,Economic,Policy & public services

 
Description (N00014-16-R-FO05) Semantic Information Pursuit for Multimodal Data Analysis
Amount £980,875 (GBP)
Funding ID EP/R018456/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Academic/University
Country United Kingdom
Start 01/2018 
End 01/2023
 
Description ExTOL: End to End Translation of British Sign Language
Amount £971,921 (GBP)
Funding ID EP/R03298X/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Academic/University
Country United Kingdom
Start 07/2018 
End 06/2021
 
Description Anthropics Technology Ltd 
Organisation Anthropics Technology
Country United Kingdom 
Sector Private 
Start Year 2005
 
Description BBC Research and Development 
Organisation British Broadcasting Corporation (BBC)
Country United Kingdom 
Sector Public 
PI Contribution Research in Computer Vision for broadcast production and Audio. Technologies for 3D production, free-view point video in sports, stereo production from monocular cameras, video annotation Member of the BBC Audio Research Partnership - developing the next generation of broadcast technology.
Collaborator Contribution In kind contribution (members of Steering/Advisory Boards) Use of the BBC lab and research/development facilities. Studentships (industrial case) funding and co-supervision of PhD students.
Impact Multi-disciplinary collaboration involves Computer Vision, Video Analysis, Psychoacoustics, Signal Processing and Spatial Audio
 
Description Bang and Olufsen 
Organisation Bang & Olufsen
Country Denmark 
Sector Private 
PI Contribution Spatial audio research (POSZ and S3A EPSRC funded projects)
Collaborator Contribution Scholarships (fees and bursaries) for EU/Home students. In-kind contribution by members of B&O Research department (Soren Bech, member of Steering /Advisory Boards and co-supervisor of funded students). Use of research facilities at their labs in Denmark
Impact Publications listed on http://iosr.uk/projects/POSZ/ Multi-disciplinary Collaboration: Signal Processing, Psychoacoustics and Spatial audio
 
Description British Broadcasting Corporation 
Organisation British Broadcasting Corporation (BBC)
Country United Kingdom 
Sector Public 
Start Year 2004
 
Description Numerion 
Organisation Numerion Software Limited
Country United Kingdom 
Sector Private 
PI Contribution Validation of physics based cloth simulation against real cloth measurements
Collaborator Contribution Physics based cloth simulation expertise
Impact Physically validated cloth simulation tools
Start Year 2014
 
Description Snell & Wilcox Ltd 
Organisation Snell & Wilcox Ltd
Country United Kingdom 
Sector Private 
Start Year 2006
 
Description Sony Broadcast and Professional Europe 
Organisation SONY
Department Sony Broadcast and Professional Europe
Country United Kingdom 
Sector Private 
Start Year 2004