Audio-Visual Media Research Platform

Lead Research Organisation: University of Surrey
Department Name: Vision Speech and Signal Proc CVSSP

Abstract

The strategic objective of this platform grant is to underpin Audio-Visual Media Research within the Centre for Vision, Speech and Signal Processing (CVSSP) to pursue fundamental research combining internationally leading expertise in understanding of real-world audio and visual data, and to transfer this capability to impact new application domains. Our goal is to pioneer new technologies which impact directly on industry practice in healthcare, sports, retail, communication, entertainment and training. This builds on CVSSP's unique track-record of world-leading research in both audio and visual machine perception which has enabled ground-breaking technology exploited by UK industry. The strategic contribution and international standing of the centres research in audio and visual media has been recognised by EPSRC through two previous platform grant awards (2003-14) and two programme grant awards in 2013 and 2015. Platform Grant funding is requested to reinforce the critical mass of expertise and knowledge of specialist facilities required to contribute advance in both fundamental understanding and pioneering new technology. In particular this Platform Grant will catalyse advances in multi-sensory machine perception building on the Centre's unique strengths in audio and vision. Key experienced post-doctoral researchers have specialist knowledge and practical know-how, which is an important resource for training new researchers and for maintaining cutting edge research using state-of-the-art facilities. Strategically the Platform Grant will build on recent independent advances in audio and visual scene analysis to lead multi-sensory understanding and modelling of real-world scenes. Research advances will provide the foundation for UK industry to lead the development of technologies ranging from intelligent sensing for healthcare and assisted living to immersive entertainment production. Platform Grant funding will also strengthen CVSSP's international collaboration with leading groups world-wide through extended research secondments US (Washington, USC), Asia (Tsinghua, Tianjin, Kyoto, Tokyo, KAUST) and Europe (INRIA, MPI, Fraunhofer, ETH, EPFL, KTH, CTU, UPF).

Planned Impact

Platform Grant support for audio-visual media processing will support a critical-mass of joint research expertise in multi-sensory machine perception. This is critical towards achieving machines which can hear and see to understand and interact with real-world dynamic scenes. Recent advances in both the audio and vision research communities, with the introduction of deep learning methodologies, have achieved a step change in the capability of machine understanding enabling for the first time automatic interpretation of real-world complexity audio and visual data. Our research seeks to capitalise on these advances developing both the next generation of research leaders in multi-sensory machine perception and realising the capabilities for autonomous systems capable of combining sensing modalities to robustly understand and interpret audio-visual media. Research will address the open-challenge of machine understanding of complex dynamic real-world scenes combining the complementary information available from audio and visual sensor to achieve robust interpretation. These research advances are of central interest to both the audio and vision research communities and will bring together advances in machine perception. Joint audio-visual processing is essential to overcome the inherent ambiguities in either sensing modality such as occlusion, limited field of view and uniform appearance in visual sensing which commonly occur and can result in failure of visual understanding. Audio cues can overcome these limitations providing wide-area information and allowing the continuous sensing of objects which are visually obscured. For example both audio and visual cues are essential for non-contact monitoring of people in healthcare and assisted living applications.

Research in audio-visual signal processing and machine perception enabling robust interpretation of complex real-world dynamic scenes will have wide-spread impact on application domains through collaborative research with UK industry. These include:
- Medical diagnosis: to easily measure and monitor moving people from audio and video sensing for biomechanical analysis and to understand the relationship of body shape to health problems such as obesity and sleep patterns
- Sports performance analysis: to allow non-invasive performance analysis of player movement and team interaction in both training and live matches
- Veterinary science: to monitor animal behaviour for improved welfare and early detection of abnormalities
- Film, games and VR production: to support creativity through interpretation and editing of real-world content and to create digital doubles that appear and behave like a real person based on 4D audio-visual capture of actor performance
- Security and surveillance: to monitor individual human behaviour using visual biometrics based on 4D audio-visual analysis of the face and body in motion
- Robotics: to enable robots to safely navigate domestic environments with people and pets
- Human-computer communication: to understand individual behaviour for natural interaction
- Manufacturing industry: new methods for contactless measurement and analysis of surfaces using joint audio-visual sensing.
- Retail industry: methods to capture representations of both the audio and visual properties of objects to allow customers to experience these remotely.
- Cultural heritage: techniques to digitise artefacts for scientific study and to make them available to a wider audience.

Advances in these areas will be of direct benefit to the general public in healthcare, assisted living, entertainment and training.
Platform Grant funding will support underpinning research and pilot studies to build collaborative research in these domains.

Publications

10 25 50

publication icon
Liu Y (2020) Audio-Visual Particle Flow SMC-PHD Filtering for Multi-Speaker Tracking in IEEE Transactions on Multimedia

publication icon
Malleson C (2019) Hybrid Modeling of Non-Rigid Scenes From RGBD Cameras in IEEE Transactions on Circuits and Systems for Video Technology

publication icon
Malleson C (2019) Real-Time Multi-person Motion Capture from Multi-view Video and IMUs in International Journal of Computer Vision

publication icon
Matt J. Kusner (2017) Counterfactual Fairness

publication icon
Moeslund T (2017) Computer Vision in Sports in Computer Vision and Image Understanding

publication icon
Mohammadi SM (2018) Sleep Posture Classification using a Convolutional Neural Network. in Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference

publication icon
Mohammadi SM (2021) Transfer Learning for Clinical Sleep Pose Detection Using a Single 2D IR Camera. in IEEE transactions on neural systems and rehabilitation engineering : a publication of the IEEE Engineering in Medicine and Biology Society

publication icon
Mohammadi SM (2019) Two-Step Deep Learning for Estimating Human Sleep Pose Occluded by Bed Covers. in Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference

publication icon
Mustafa A (2022) 4D Temporally Coherent Multi-Person Semantic Reconstruction and Segmentation in International Journal of Computer Vision

publication icon
Mustafa A (2019) Semantically Coherent 4D Scene Flow of Dynamic Scenes in International Journal of Computer Vision

publication icon
Mustafa A (2019) MSFD: Multi-Scale Segmentation-Based Feature Detection for Wide-Baseline Scene Reconstruction. in IEEE transactions on image processing : a publication of the IEEE Signal Processing Society

publication icon
Mustafa A (2020) Temporally Coherent General Dynamic Scene Reconstruction in International Journal of Computer Vision

publication icon
Pansari P (2019) Linear programming-based submodular extensions for marginal estimation in Computer Vision and Image Understanding

publication icon
Remaggi L (2017) Acoustic Reflector Localization: Novel Image Source Reversion and Direct Localization Methods in IEEE/ACM Transactions on Audio, Speech, and Language Processing

publication icon
Thomas G (2017) Computer vision for sports: Current applications and research topics in Computer Vision and Image Understanding

publication icon
Wan J (2018) Articulated motion and deformable objects in Pattern Recognition

publication icon
Zhu Q (2017) Robust reproduction of sound zones with local sound orientation. in The Journal of the Acoustical Society of America

 
Description Introduction of technologies for audio-visual scene understanding, understanding of human movement and behaviour, autonomous navigation through scene understanding and audio-visual search. These technologies are exploiting advances in machine learning for scene understanding from huge databases of audio-visual data. Novel learning methodologies and architectures have been introduced to overcome the limitations of previous approaches and enable operation in complex real-world settings.

Research in the first two years of the platform has seen a step change in the joint audio-visual machine learning for automatic understanding of dynamic scenes from video. Novel AI/machine learning methods and architectures have been introduced. Research has resulted in world-leading technologies for both audio and visual scene recognition demonstrated through awards in international competitions.
Exploitation Route Research advances and technologies are actively being exploited across a wide range of sectors through further collaboration.

Google '8M YouTube video annotation challenge' Gold Medal - 7th out of 650 entries;
Google 'Landmark retrieval challenge' 1st Place out of >150 entries;
DCASE 'Audio Recognition Challenge' 1st Place
Sectors Aerospace, Defence and Marine,Agriculture, Food and Drink,Communities and Social Services/Policy,Creative Economy,Digital/Communication/Information Technologies (including Software),Education,Electronics,Environment,Financial Services, and Management Consultancy,Healthcare,Leisure Activities, including Sports, Recreation and Tourism,Government, Democracy and Justice,Manufacturing, including Industrial Biotechology,Culture, Heritage, Museums and Collections,Retail,Security and Diplomacy

URL http://www.surrey.ac.uk/cvssp
 
Description The platform grant is supporting research with impact across a wide range of sectors including healthcare, autonomous vehicles, entertainment, securing, manufacture and communications. In the first two years of the platform there has been significant impact on both advances in methodologies for audio-visual machine learning demonstrating world-leading performance and the deployment of machine learning technologies for application in healthcare, security, entertainment, communication, robotics and autonomous systems.
First Year Of Impact 2018
Sector Aerospace, Defence and Marine,Agriculture, Food and Drink,Communities and Social Services/Policy,Creative Economy,Digital/Communication/Information Technologies (including Software),Education,Electronics,Financial Services, and Management Consultancy,Healthcare,Leisure Activities, including Sports, Recreation and Tourism,Government, Democracy and Justice,Manufacturing, including Industrial Biotechology,Culture, Heritage, Museums and Collections,Retail,Security and Diplomacy,Transport
Impact Types Cultural,Societal,Economic,Policy & public services

 
Description Ofcom Object-based Media Working Group
Geographic Reach National 
Policy Influence Type Participation in a guidance/advisory committee
Impact Influence on media communication and service regulation
URL https://www.ofcom.org.uk
 
Description (N00014-16-R-FO05) Semantic Information Pursuit for Multimodal Data Analysis
Amount £980,875 (GBP)
Funding ID EP/R018456/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 01/2018 
End 01/2023
 
Description AI for Sound
Amount £2,120,276 (GBP)
Funding ID EP/T019751/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 05/2020 
End 04/2025
 
Description ARCHANGEL - Trusted Archives of Digital Public Records
Amount £487,428 (GBP)
Funding ID EP/P03151X/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 06/2017 
End 06/2019
 
Description BBC Prosperity Partnership: Future Personalised Object-based Media Experiences Delivered at Scale Anywhere
Amount £8,500,000 (GBP)
Funding ID EP/V038087/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 07/2021 
End 06/2026
 
Description EPSRC UK Acoustics Network Plus
Amount £1,418,894 (GBP)
Funding ID EP/V007866/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 11/2020 
End 10/2024
 
Description ExTOL: End to End Translation of British Sign Language
Amount £971,921 (GBP)
Funding ID EP/R03298X/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 07/2018 
End 06/2021
 
Description Multimodal Video Search by Examples (MVSE)
Amount £863,564 (GBP)
Funding ID EP/V002856/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 09/2020 
End 09/2023
 
Description Next Stage Digital Economy Centre in the Decentralised Digital Economy (DECaDE)
Amount £3,816,713 (GBP)
Funding ID EP/T022485/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 09/2020 
End 09/2025
 
Description Polymersive: Immersive Video Production Tools for Studio and Live Events
Amount £726,251 (GBP)
Funding ID 105168 
Organisation Innovate UK 
Sector Public
Country United Kingdom
Start 03/2019 
End 09/2020
 
Description RetinaScan: AI-enabled automated image assessment system for diabetic retinopathy screening
Amount £791,559 (GBP)
Funding ID 104184 
Organisation Innovate UK 
Sector Public
Country United Kingdom
Start 03/2018 
End 08/2020
 
Description TAPESTRY: Trust, Authentication and Privacy over a DeCentralised Social Registry
Amount £854,808 (GBP)
Funding ID EP/N02799X/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 01/2017 
End 09/2020
 
Title Audio-visual production dataset for weather forecaster personalisation use case 
Description Audio and video recording of presenters for production resarch 
Type Of Material Database/Collection of data 
Year Produced 2023 
Provided To Others? Yes  
Impact Use in production demonstrators January 2023 
URL http://cvssp.org
 
Description Anthropics Technology Ltd 
Organisation Anthropics Technology
Country United Kingdom 
Sector Private 
Start Year 2005
 
Description BBC Research and Development 
Organisation British Broadcasting Corporation (BBC)
Country United Kingdom 
Sector Public 
PI Contribution Research in Computer Vision for broadcast production and Audio. Technologies for 3D production, free-view point video in sports, stereo production from monocular cameras, video annotation Member of the BBC Audio Research Partnership - developing the next generation of broadcast technology.
Collaborator Contribution In kind contribution (members of Steering/Advisory Boards) Use of the BBC lab and research/development facilities. Studentships (industrial case) funding and co-supervision of PhD students.
Impact Multi-disciplinary collaboration involves Computer Vision, Video Analysis, Psychoacoustics, Signal Processing and Spatial Audio
 
Description BT 
Organisation BT Group
Country United Kingdom 
Sector Private 
PI Contribution research collaboration in object-based/personalised media delivery
Collaborator Contribution support of research activities
Impact ongoing collaboration
Start Year 2021
 
Description Bang and Olufsen 
Organisation Bang & Olufsen
Country Denmark 
Sector Private 
PI Contribution Spatial audio research (POSZ and S3A EPSRC funded projects)
Collaborator Contribution Scholarships (fees and bursaries) for EU/Home students. In-kind contribution by members of B&O Research department (Soren Bech, member of Steering /Advisory Boards and co-supervisor of funded students). Use of research facilities at their labs in Denmark
Impact Publications listed on http://iosr.uk/projects/POSZ/ Multi-disciplinary Collaboration: Signal Processing, Psychoacoustics and Spatial audio
 
Description British Broadcasting Corporation 
Organisation British Broadcasting Corporation (BBC)
Country United Kingdom 
Sector Public 
Start Year 2004
 
Description Figment Productions 
Organisation Figment Productions Limited
Country United Kingdom 
Sector Private 
PI Contribution Advice/collaboration on personalisation in XR experiences
Collaborator Contribution research collaboration and attendance at industry meetings
Impact ongoing research
Start Year 2021
 
Description Foundry 
Organisation Foundry
Country United Kingdom 
Sector Private 
PI Contribution Collaboration on Computer Vision and AI tools for film and VFX production
Collaborator Contribution Novel computer vision methods for segmentation, reconstruction, tracking and representation of people from video. Used for actor performance capture and visual effects production.
Impact Novel Computer vision methods for video analysis of actor performance
Start Year 2008
 
Description Foundry 
Organisation Foundry
Country United Kingdom 
Sector Private 
PI Contribution Collaboration on Computer Vision and AI tools for film and VFX production
Collaborator Contribution Novel computer vision methods for segmentation, reconstruction, tracking and representation of people from video. Used for actor performance capture and visual effects production.
Impact Novel Computer vision methods for video analysis of actor performance
Start Year 2008
 
Description Imagineer Systems 
Organisation Boris FX
Country United Kingdom 
Sector Private 
PI Contribution Personalised media production tools
Collaborator Contribution Advice/collaboration on production tools
Impact ongoing collaboration
Start Year 2021
 
Description Intel 
Organisation Intel Corporation
Country United States 
Sector Private 
PI Contribution Research collaboration in personalised media
Collaborator Contribution collaboration/PhD Sponsorship
Impact ongoing collaboration
Start Year 2021
 
Description Mirriad 
Organisation Mirriad
Country United Kingdom 
Sector Private 
PI Contribution Personalisation in advertising advice
Collaborator Contribution Industry partner briefing on requirements for personalisation
Impact Ongoing collaboration
Start Year 2021
 
Description Numerion 
Organisation Numerion Software Limited
Country United Kingdom 
Sector Private 
PI Contribution Validation of physics based cloth simulation against real cloth measurements
Collaborator Contribution Physics based cloth simulation expertise
Impact Physically validated cloth simulation tools
Start Year 2014
 
Description Snell & Wilcox Ltd 
Organisation Snell & Wilcox Ltd
Country United Kingdom 
Sector Private 
Start Year 2006
 
Description Sony Broadcast and Professional Europe 
Organisation SONY
Department Sony Broadcast and Professional Europe
Country United Kingdom 
Sector Private 
Start Year 2004
 
Description BBC Sounds Amazing 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Industry/academic forum for research and production industry professionals in audio and sound hosted by the BBC
Year(s) Of Engagement Activity 2021,2022
URL https://www.bbc.co.uk/academy/events/sounds-amazing-2022/
 
Description Barbican AI more than Human Festival 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Public/other audiences
Results and Impact Barbican demonstration and talk as part of the AI: More than human festival. Live demonstration of AI technology for human movement capture.
Year(s) Of Engagement Activity 2018
URL https://www.barbican.org.uk/whats-on/2019/event/ai-more-than-human
 
Description CVPR 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Industry/Business
Results and Impact Primary international forum for computer vision and AI/machine learning research in audio-visual media. Research dissemination through papers, key-note invited talks and workshop organisation
Year(s) Of Engagement Activity 2021,2022,2023
URL https://cvpr2023.thecvf.com
 
Description CVSSP 30th Anniversary Celebration 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Industry/Business
Results and Impact Centre for Vision, Speech and Signal Processing (CVSSP) 30th Anniversary Celebration with over 500 participants from industry, government and alumni. The event themed on 'Can Machines Think' included a series of key-note talks from alumni who are international leaders in academia and industry, over 30 live demos of current research, and an open house at the centre for both industry and guests. There was also a VIP dinner hosted by the Vice-Chancellor of the University.
Year(s) Of Engagement Activity 2019
URL http://surrey.ac.uk/cvssp
 
Description European Conference on Visual Media Production 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Presentation of research advances at industry-academic forum
Year(s) Of Engagement Activity 2021,2022
URL https://www.cvmp-conference.org/
 
Description Invited talk to the Association of Noise Consultants, Aug 2020 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Invited talk on "Artificial Intelligence for Sound" to the Association of Noise Consultants, 27 Aug 2020 (Video meeting).
Year(s) Of Engagement Activity 2020
URL https://www.linkedin.com/feed/update/urn:li:activity:6697179064635150337/
 
Description Royal Society 'You and AI' 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact Royal Society "You and AI" event hosted by Brian Cox held at the Barbican with an audience of 2000 and streamed via the Royal Society to National/International Audience. Live demonstration and discussion of AI technology for human motion capture from video.
Year(s) Of Engagement Activity 2018
URL https://www.barbican.org.uk/whats-on/2018/event/you-and-ai-with-brian-cox