Audio-Visual Media Research Platform

Lead Research Organisation: University of Surrey

Department Name: Vision Speech and Signal Proc CVSSP

Abstract

The strategic objective of this platform grant is to underpin Audio-Visual Media Research within the Centre for Vision, Speech and Signal Processing (CVSSP) to pursue fundamental research combining internationally leading expertise in understanding of real-world audio and visual data, and to transfer this capability to impact new application domains. Our goal is to pioneer new technologies which impact directly on industry practice in healthcare, sports, retail, communication, entertainment and training. This builds on CVSSP's unique track-record of world-leading research in both audio and visual machine perception which has enabled ground-breaking technology exploited by UK industry. The strategic contribution and international standing of the centres research in audio and visual media has been recognised by EPSRC through two previous platform grant awards (2003-14) and two programme grant awards in 2013 and 2015. Platform Grant funding is requested to reinforce the critical mass of expertise and knowledge of specialist facilities required to contribute advance in both fundamental understanding and pioneering new technology. In particular this Platform Grant will catalyse advances in multi-sensory machine perception building on the Centre's unique strengths in audio and vision. Key experienced post-doctoral researchers have specialist knowledge and practical know-how, which is an important resource for training new researchers and for maintaining cutting edge research using state-of-the-art facilities. Strategically the Platform Grant will build on recent independent advances in audio and visual scene analysis to lead multi-sensory understanding and modelling of real-world scenes. Research advances will provide the foundation for UK industry to lead the development of technologies ranging from intelligent sensing for healthcare and assisted living to immersive entertainment production. Platform Grant funding will also strengthen CVSSP's international collaboration with leading groups world-wide through extended research secondments US (Washington, USC), Asia (Tsinghua, Tianjin, Kyoto, Tokyo, KAUST) and Europe (INRIA, MPI, Fraunhofer, ETH, EPFL, KTH, CTU, UPF).

Planned Impact

Platform Grant support for audio-visual media processing will support a critical-mass of joint research expertise in multi-sensory machine perception. This is critical towards achieving machines which can hear and see to understand and interact with real-world dynamic scenes. Recent advances in both the audio and vision research communities, with the introduction of deep learning methodologies, have achieved a step change in the capability of machine understanding enabling for the first time automatic interpretation of real-world complexity audio and visual data. Our research seeks to capitalise on these advances developing both the next generation of research leaders in multi-sensory machine perception and realising the capabilities for autonomous systems capable of combining sensing modalities to robustly understand and interpret audio-visual media. Research will address the open-challenge of machine understanding of complex dynamic real-world scenes combining the complementary information available from audio and visual sensor to achieve robust interpretation. These research advances are of central interest to both the audio and vision research communities and will bring together advances in machine perception. Joint audio-visual processing is essential to overcome the inherent ambiguities in either sensing modality such as occlusion, limited field of view and uniform appearance in visual sensing which commonly occur and can result in failure of visual understanding. Audio cues can overcome these limitations providing wide-area information and allowing the continuous sensing of objects which are visually obscured. For example both audio and visual cues are essential for non-contact monitoring of people in healthcare and assisted living applications.

Research in audio-visual signal processing and machine perception enabling robust interpretation of complex real-world dynamic scenes will have wide-spread impact on application domains through collaborative research with UK industry. These include:
- Medical diagnosis: to easily measure and monitor moving people from audio and video sensing for biomechanical analysis and to understand the relationship of body shape to health problems such as obesity and sleep patterns
- Sports performance analysis: to allow non-invasive performance analysis of player movement and team interaction in both training and live matches
- Veterinary science: to monitor animal behaviour for improved welfare and early detection of abnormalities
- Film, games and VR production: to support creativity through interpretation and editing of real-world content and to create digital doubles that appear and behave like a real person based on 4D audio-visual capture of actor performance
- Security and surveillance: to monitor individual human behaviour using visual biometrics based on 4D audio-visual analysis of the face and body in motion
- Robotics: to enable robots to safely navigate domestic environments with people and pets
- Human-computer communication: to understand individual behaviour for natural interaction
- Manufacturing industry: new methods for contactless measurement and analysis of surfaces using joint audio-visual sensing.
- Retail industry: methods to capture representations of both the audio and visual properties of objects to allow customers to experience these remotely.
- Cultural heritage: techniques to digitise artefacts for scientific study and to make them available to a wider audience.

Advances in these areas will be of direct benefit to the general public in healthcare, assisted living, entertainment and training.
Platform Grant funding will support underpinning research and pilot studies to build collaborative research in these domains.

Funded Value:

£1,577,222

Funded Period:

Jul 17 - Jan 23

Funder:

EPSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

EP/P022529/1

Principal Investigator:

Adrian Hilton

Research Subject:

Info. & commun. Technol. (100%)

Research Topic:

Computer Graphics & Visual. (25%)

Digital Signal Processing (25%)

Image & Vision Computing (25%)

Multimedia (25%)

Organisations

People	ORCID iD
Adrian Hilton (Principal Investigator)
Wenwu Wang (Co-Investigator)
Philip J B Jackson (Co-Investigator)
Jean-Yves Guillemaut (Co-Investigator)	http://orcid.org/0000-0001-8223-5505
Mark Plumbley (Co-Investigator)	http://orcid.org/0000-0002-9708-1075
John Collomosse (Co-Investigator)
Josef Kittler (Co-Investigator)	http://orcid.org/0000-0002-8110-9205

Publications

Author Name

Title Publication Date Published

|< < 1 2 3 4 5 6 7 8 9 10 > >|

10 25 50

Addari G (2023) A Family of Approaches for Full 3D Reconstruction of Objects with Complex Surface Reflectance in International Journal of Computer Vision

Akash Srivastava (2017) VEEGAN: Reducing Mode Collapse in GANs using Implicit Variational Learning

Bailey M (2022) Finite Aperture Stereo in International Journal of Computer Vision

Bailey M (2021) Finite Aperture Stereo: 3D Reconstruction of Macro-Scale Scenes

Boll S (2018) AVSU

Bridgeman L (2021) Dynamic Appearance Modelling from Minimal Cameras

Caliskan A (2021) Temporal Consistency Loss for High Resolution Textured and Clothed 3D Human Reconstruction from Monocular Video

Caliskan A (2021) Computer Vision - ACCV 2020 - 15th Asian Conference on Computer Vision, Kyoto, Japan, November 30 - December 4, 2020, Revised Selected Papers, Part I

Cao Y (2021) An Improved Event-Independent Network for Polyphonic Sound Event Localization and Detection

Cao Y (2020) Event-independent network for polyphonic sound event localization and detection

Key Findings
Impact Summary
Policy Influence
Further Funding
Research Databases and Models
Collaboration
Spin Outs
Engagement Activities


Description	Introduction of technologies for audio-visual scene understanding, understanding of human movement and behaviour, autonomous navigation through scene understanding and audio-visual search. These technologies are exploiting advances in machine learning for scene understanding from huge databases of audio-visual data. Novel learning methodologies and architectures have been introduced to overcome the limitations of previous approaches and enable operation in complex real-world settings. Research in the first two years of the platform has seen a step change in the joint audio-visual machine learning for automatic understanding of dynamic scenes from video. Novel AI/machine learning methods and architectures have been introduced. Research has resulted in world-leading technologies for both audio and visual scene recognition demonstrated through awards in international competitions.
Exploitation Route	Research advances and technologies are actively being exploited across a wide range of sectors through further collaboration. Google '8M YouTube video annotation challenge' Gold Medal - 7th out of 650 entries; Google 'Landmark retrieval challenge' 1st Place out of >150 entries; DCASE 'Audio Recognition Challenge' 1st Place
Sectors	Aerospace Defence and Marine Agriculture Food and Drink Communities and Social Services/Policy Creative Economy Digital/Communication/Information Technologies (including Software) Education Electronics Environment Financial Services and Management Consultancy Healthcare Leisure Activities including Sports Recreation and Tourism Government Democracy and Justice Manufacturing including Industrial Biotechology Culture Heritage Museums and Collections Retail Security and Diplomacy
URL	http://www.surrey.ac.uk/cvssp


Description	The platform grant is supporting research with impact across a wide range of sectors including healthcare, autonomous vehicles, entertainment, securing, manufacture and communications. In the first two years of the platform there has been significant impact on both advances in methodologies for audio-visual machine learning demonstrating world-leading performance and the deployment of machine learning technologies for application in healthcare, security, entertainment, communication, robotics and autonomous systems.
First Year Of Impact	2018
Sector	Aerospace, Defence and Marine,Agriculture, Food and Drink,Communities and Social Services/Policy,Creative Economy,Digital/Communication/Information Technologies (including Software),Education,Electronics,Financial Services, and Management Consultancy,Healthcare,Leisure Activities, including Sports, Recreation and Tourism,Government, Democracy and Justice,Manufacturing, including Industrial Biotechology,Culture, Heritage, Museums and Collections,Retail,Security and Diplomacy,Transport
Impact Types	Cultural Societal Economic Policy & public services


Description	Ofcom Object-based Media Working Group
Geographic Reach	National
Policy Influence Type	Participation in a guidance/advisory committee
Impact	Influence on media communication and service regulation
URL	https://www.ofcom.org.uk


Description	(N00014-16-R-FO05) Semantic Information Pursuit for Multimodal Data Analysis
Amount	£980,875 (GBP)
Funding ID	EP/R018456/1
Organisation	Engineering and Physical Sciences Research Council (EPSRC)
Sector	Public
Country	United Kingdom
Start	01/2018
End	01/2023


Description	AI for Sound
Amount	£2,120,276 (GBP)
Funding ID	EP/T019751/1
Organisation	Engineering and Physical Sciences Research Council (EPSRC)
Sector	Public
Country	United Kingdom
Start	04/2020
End	04/2025


Description	ARCHANGEL - Trusted Archives of Digital Public Records
Amount	£487,428 (GBP)
Funding ID	EP/P03151X/1
Organisation	Engineering and Physical Sciences Research Council (EPSRC)
Sector	Public
Country	United Kingdom
Start	05/2017
End	06/2019


Description	BBC Prosperity Partnership: Future Personalised Object-based Media Experiences Delivered at Scale Anywhere
Amount	£8,500,000 (GBP)
Funding ID	EP/V038087/1
Organisation	Engineering and Physical Sciences Research Council (EPSRC)
Sector	Public
Country	United Kingdom
Start	06/2021
End	06/2026


Description	EPSRC UK Acoustics Network Plus
Amount	£1,418,894 (GBP)
Funding ID	EP/V007866/1
Organisation	Engineering and Physical Sciences Research Council (EPSRC)
Sector	Public
Country	United Kingdom
Start	03/2021
End	10/2025


Description	ExTOL: End to End Translation of British Sign Language
Amount	£971,921 (GBP)
Funding ID	EP/R03298X/1
Organisation	Engineering and Physical Sciences Research Council (EPSRC)
Sector	Public
Country	United Kingdom
Start	06/2018
End	06/2022


Description	JADE: Joint Academic Data science Endeavour - 2
Amount	£6,739,771 (GBP)
Funding ID	EP/T022205/1
Organisation	Engineering and Physical Sciences Research Council (EPSRC)
Sector	Public
Country	United Kingdom
Start	01/2020
End	03/2025


Description	Multimodal Video Search by Examples (MVSE)
Amount	£863,564 (GBP)
Funding ID	EP/V002856/1
Organisation	Engineering and Physical Sciences Research Council (EPSRC)
Sector	Public
Country	United Kingdom
Start	03/2021
End	09/2024


Description	Next Stage Digital Economy Centre in the Decentralised Digital Economy (DECaDE)
Amount	£3,816,713 (GBP)
Funding ID	EP/T022485/1
Organisation	Engineering and Physical Sciences Research Council (EPSRC)
Sector	Public
Country	United Kingdom
Start	09/2020
End	09/2025


Description	Polymersive: Immersive Video Production Tools for Studio and Live Events
Amount	£726,251 (GBP)
Funding ID	105168
Organisation	Innovate UK
Sector	Public
Country	United Kingdom
Start	03/2019
End	09/2020


Description	RetinaScan: AI-enabled automated image assessment system for diabetic retinopathy screening
Amount	£791,559 (GBP)
Funding ID	104184
Organisation	Innovate UK
Sector	Public
Country	United Kingdom
Start	03/2018
End	08/2020


Description	TAPESTRY: Trust, Authentication and Privacy over a DeCentralised Social Registry
Amount	£854,808 (GBP)
Funding ID	EP/N02799X/1
Organisation	Engineering and Physical Sciences Research Council (EPSRC)
Sector	Public
Country	United Kingdom
Start	01/2017
End	09/2020


Title	Audio-visual production dataset for weather forecaster personalisation use case
Description	Audio and video recording of presenters for production resarch
Type Of Material	Database/Collection of data
Year Produced	2023
Provided To Others?	Yes
Impact	Use in production demonstrators January 2023
URL	http://cvssp.org


Title	Finite Aperture Stereo Datasets
Description	This landing page contains the datasets presented in the paper "Finite Aperture Stereo". The datasets are intended for defocus-based 3D reconstruction and analysis. Each download link contains images of a static scene, captured from multiple viewpoints and with different focus settings. The captured objects exhibit a range of reflectance properties and are physically small in scale. Calibration images are also available.
Type Of Material	Database/Collection of data
Year Produced	2023
Provided To Others?	Yes
Impact	This dataset enables benchmarking of 3D reconstruction algorithms.
URL	https://openresearch.surrey.ac.uk/esploro/outputs/dataset/99661466602346


Title	Multi-View Labelling (MVL) Dataset
Description	To overcome the shortage of real-world multi-view multiple people, we introduce a new synthetic multi-view multiple people labelling dataset named Multi-View 3D Humans (MV3DHumans). This dataset is a large-scale synthetic image dataset that was generated for multi-view multiple people detection, labelling and segmentation tasks. The MV3DHumans dataset contains 1200 scenes captured by multiple cameras, with 4, 6, 8 or 10 people in each scene. Each scene is captured by 16 cameras with overlapping field of views. The MV3DHumans dataset provides RGB images with resolution of 640 × 480. Ground truth annotations including bounding boxes, instance masks and multi-view correspondences, as well as camera calibrations are provided in the dataset.
Type Of Material	Database/Collection of data
Year Produced	2023
Provided To Others?	Yes
Impact	This enables training and benchmarking of algorithms multi-view multiple people detection, labelling and segmentation.
URL	https://openresearch.surrey.ac.uk/esploro/outputs/dataset/99581923802346


Description	Anthropics Technology Ltd
Organisation	Anthropics Technology
Country	United Kingdom
Sector	Private
Start Year	2005


Description	BBC Research and Development
Organisation	British Broadcasting Corporation (BBC)
Country	United Kingdom
Sector	Public
PI Contribution	Research in Computer Vision for broadcast production and Audio. Technologies for 3D production, free-view point video in sports, stereo production from monocular cameras, video annotation Member of the BBC Audio Research Partnership - developing the next generation of broadcast technology.
Collaborator Contribution	In kind contribution (members of Steering/Advisory Boards) Use of the BBC lab and research/development facilities. Studentships (industrial case) funding and co-supervision of PhD students.
Impact	Multi-disciplinary collaboration involves Computer Vision, Video Analysis, Psychoacoustics, Signal Processing and Spatial Audio


Description	BT
Organisation	BT Group
Country	United Kingdom
Sector	Private
PI Contribution	research collaboration in object-based/personalised media delivery
Collaborator Contribution	support of research activities
Impact	ongoing collaboration
Start Year	2021


Description	Bang and Olufsen
Organisation	Bang & Olufsen
Country	Denmark
Sector	Private
PI Contribution	Spatial audio research (POSZ and S3A EPSRC funded projects)
Collaborator Contribution	Scholarships (fees and bursaries) for EU/Home students. In-kind contribution by members of B&O Research department (Soren Bech, member of Steering /Advisory Boards and co-supervisor of funded students). Use of research facilities at their labs in Denmark
Impact	Publications listed on http://iosr.uk/projects/POSZ/ Multi-disciplinary Collaboration: Signal Processing, Psychoacoustics and Spatial audio


Description	British Broadcasting Corporation
Organisation	British Broadcasting Corporation (BBC)
Country	United Kingdom
Sector	Public
Start Year	2004


Description	Figment Productions
Organisation	Figment Productions Limited
Country	United Kingdom
Sector	Private
PI Contribution	Advice/collaboration on personalisation in XR experiences
Collaborator Contribution	research collaboration and attendance at industry meetings
Impact	ongoing research
Start Year	2021


Description	Foundry
Organisation	Foundry
Country	United Kingdom
Sector	Private
PI Contribution	Collaboration on Computer Vision and AI tools for film and VFX production
Collaborator Contribution	Novel computer vision methods for segmentation, reconstruction, tracking and representation of people from video. Used for actor performance capture and visual effects production.
Impact	Novel Computer vision methods for video analysis of actor performance
Start Year	2008


Description	Foundry
Organisation	Foundry
Country	United Kingdom
Sector	Private
PI Contribution	Collaboration on Computer Vision and AI tools for film and VFX production
Collaborator Contribution	Novel computer vision methods for segmentation, reconstruction, tracking and representation of people from video. Used for actor performance capture and visual effects production.
Impact	Novel Computer vision methods for video analysis of actor performance
Start Year	2008


Description	Imagineer Systems
Organisation	Boris FX
Country	United Kingdom
Sector	Private
PI Contribution	Personalised media production tools
Collaborator Contribution	Advice/collaboration on production tools
Impact	ongoing collaboration
Start Year	2021


Description	Intel
Organisation	Intel Corporation
Country	United States
Sector	Private
PI Contribution	Research collaboration in personalised media
Collaborator Contribution	collaboration/PhD Sponsorship
Impact	ongoing collaboration
Start Year	2021


Description	Mirriad
Organisation	Mirriad
Country	United Kingdom
Sector	Private
PI Contribution	Personalisation in advertising advice
Collaborator Contribution	Industry partner briefing on requirements for personalisation
Impact	Ongoing collaboration
Start Year	2021


Description	Numerion
Organisation	Numerion Software Limited
Country	United Kingdom
Sector	Private
PI Contribution	Validation of physics based cloth simulation against real cloth measurements
Collaborator Contribution	Physics based cloth simulation expertise
Impact	Physically validated cloth simulation tools
Start Year	2014


Description	Snell & Wilcox Ltd
Organisation	Snell & Wilcox Ltd
Country	United Kingdom
Sector	Private
Start Year	2006


Description	Sony Broadcast and Professional Europe
Organisation	SONY
Department	Sony Broadcast and Professional Europe
Country	United Kingdom
Sector	Private
Start Year	2004


Company Name	Sensus Futuris Ltd
Description	Sensus Futuris is a specialized Artificial Intelligence company with a focus on advanced image and video analytics for solutions in security, law enforcement, retail, and other applications. They provide state-of-the-art facial recognition systems and age
Year Established	2018
Impact	Advanced audio-visual AI technologies for security
Website	http://sensusfuturis.com


Company Name	Saireco Ltd
Description
Year Established	2024
Impact	Automated description of visual content to improve accessibility and support personalised audio-visual content production


Description	BBC Sounds Amazing
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Professional Practitioners
Results and Impact	Industry/academic forum for research and production industry professionals in audio and sound hosted by the BBC
Year(s) Of Engagement Activity	2021,2022
URL	https://www.bbc.co.uk/academy/events/sounds-amazing-2022/


Description	Barbican AI more than Human Festival
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Public/other audiences
Results and Impact	Barbican demonstration and talk as part of the AI: More than human festival. Live demonstration of AI technology for human movement capture.
Year(s) Of Engagement Activity	2018
URL	https://www.barbican.org.uk/whats-on/2019/event/ai-more-than-human


Description	CVPR
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Industry/Business
Results and Impact	Primary international forum for computer vision and AI/machine learning research in audio-visual media. Research dissemination through papers, key-note invited talks and workshop organisation
Year(s) Of Engagement Activity	2021,2022,2023
URL	https://cvpr2023.thecvf.com


Description	CVSSP 30th Anniversary Celebration
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Industry/Business
Results and Impact	Centre for Vision, Speech and Signal Processing (CVSSP) 30th Anniversary Celebration with over 500 participants from industry, government and alumni. The event themed on 'Can Machines Think' included a series of key-note talks from alumni who are international leaders in academia and industry, over 30 live demos of current research, and an open house at the centre for both industry and guests. There was also a VIP dinner hosted by the Vice-Chancellor of the University.
Year(s) Of Engagement Activity	2019
URL	http://surrey.ac.uk/cvssp


Description	European Conference on Visual Media Production
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Professional Practitioners
Results and Impact	Presentation of research advances at industry-academic forum
Year(s) Of Engagement Activity	2021,2022
URL	https://www.cvmp-conference.org/


Description	Invited talk to the Association of Noise Consultants, Aug 2020
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Professional Practitioners
Results and Impact	Invited talk on "Artificial Intelligence for Sound" to the Association of Noise Consultants, 27 Aug 2020 (Video meeting).
Year(s) Of Engagement Activity	2020
URL	https://www.linkedin.com/feed/update/urn:li:activity:6697179064635150337/


Description	Royal Society 'You and AI'
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Public/other audiences
Results and Impact	Royal Society "You and AI" event hosted by Brian Cox held at the Barbican with an audience of 2000 and streamed via the Royal Society to National/International Audience. Live demonstration and discussion of AI technology for human motion capture from video.
Year(s) Of Engagement Activity	2018
URL	https://www.barbican.org.uk/whats-on/2018/event/you-and-ai-with-brian-cox

Abstract

Planned Impact

Organisations

People

ORCID iD

Publications