Audio-Visual Media Research Platform
Lead Research Organisation:
University of Surrey
Department Name: Vision Speech and Signal Proc CVSSP
Abstract
The strategic objective of this platform grant is to underpin Audio-Visual Media Research within the Centre for Vision, Speech and Signal Processing (CVSSP) to pursue fundamental research combining internationally leading expertise in understanding of real-world audio and visual data, and to transfer this capability to impact new application domains. Our goal is to pioneer new technologies which impact directly on industry practice in healthcare, sports, retail, communication, entertainment and training. This builds on CVSSP's unique track-record of world-leading research in both audio and visual machine perception which has enabled ground-breaking technology exploited by UK industry. The strategic contribution and international standing of the centres research in audio and visual media has been recognised by EPSRC through two previous platform grant awards (2003-14) and two programme grant awards in 2013 and 2015. Platform Grant funding is requested to reinforce the critical mass of expertise and knowledge of specialist facilities required to contribute advance in both fundamental understanding and pioneering new technology. In particular this Platform Grant will catalyse advances in multi-sensory machine perception building on the Centre's unique strengths in audio and vision. Key experienced post-doctoral researchers have specialist knowledge and practical know-how, which is an important resource for training new researchers and for maintaining cutting edge research using state-of-the-art facilities. Strategically the Platform Grant will build on recent independent advances in audio and visual scene analysis to lead multi-sensory understanding and modelling of real-world scenes. Research advances will provide the foundation for UK industry to lead the development of technologies ranging from intelligent sensing for healthcare and assisted living to immersive entertainment production. Platform Grant funding will also strengthen CVSSP's international collaboration with leading groups world-wide through extended research secondments US (Washington, USC), Asia (Tsinghua, Tianjin, Kyoto, Tokyo, KAUST) and Europe (INRIA, MPI, Fraunhofer, ETH, EPFL, KTH, CTU, UPF).
Planned Impact
Platform Grant support for audio-visual media processing will support a critical-mass of joint research expertise in multi-sensory machine perception. This is critical towards achieving machines which can hear and see to understand and interact with real-world dynamic scenes. Recent advances in both the audio and vision research communities, with the introduction of deep learning methodologies, have achieved a step change in the capability of machine understanding enabling for the first time automatic interpretation of real-world complexity audio and visual data. Our research seeks to capitalise on these advances developing both the next generation of research leaders in multi-sensory machine perception and realising the capabilities for autonomous systems capable of combining sensing modalities to robustly understand and interpret audio-visual media. Research will address the open-challenge of machine understanding of complex dynamic real-world scenes combining the complementary information available from audio and visual sensor to achieve robust interpretation. These research advances are of central interest to both the audio and vision research communities and will bring together advances in machine perception. Joint audio-visual processing is essential to overcome the inherent ambiguities in either sensing modality such as occlusion, limited field of view and uniform appearance in visual sensing which commonly occur and can result in failure of visual understanding. Audio cues can overcome these limitations providing wide-area information and allowing the continuous sensing of objects which are visually obscured. For example both audio and visual cues are essential for non-contact monitoring of people in healthcare and assisted living applications.
Research in audio-visual signal processing and machine perception enabling robust interpretation of complex real-world dynamic scenes will have wide-spread impact on application domains through collaborative research with UK industry. These include:
- Medical diagnosis: to easily measure and monitor moving people from audio and video sensing for biomechanical analysis and to understand the relationship of body shape to health problems such as obesity and sleep patterns
- Sports performance analysis: to allow non-invasive performance analysis of player movement and team interaction in both training and live matches
- Veterinary science: to monitor animal behaviour for improved welfare and early detection of abnormalities
- Film, games and VR production: to support creativity through interpretation and editing of real-world content and to create digital doubles that appear and behave like a real person based on 4D audio-visual capture of actor performance
- Security and surveillance: to monitor individual human behaviour using visual biometrics based on 4D audio-visual analysis of the face and body in motion
- Robotics: to enable robots to safely navigate domestic environments with people and pets
- Human-computer communication: to understand individual behaviour for natural interaction
- Manufacturing industry: new methods for contactless measurement and analysis of surfaces using joint audio-visual sensing.
- Retail industry: methods to capture representations of both the audio and visual properties of objects to allow customers to experience these remotely.
- Cultural heritage: techniques to digitise artefacts for scientific study and to make them available to a wider audience.
Advances in these areas will be of direct benefit to the general public in healthcare, assisted living, entertainment and training.
Platform Grant funding will support underpinning research and pilot studies to build collaborative research in these domains.
Research in audio-visual signal processing and machine perception enabling robust interpretation of complex real-world dynamic scenes will have wide-spread impact on application domains through collaborative research with UK industry. These include:
- Medical diagnosis: to easily measure and monitor moving people from audio and video sensing for biomechanical analysis and to understand the relationship of body shape to health problems such as obesity and sleep patterns
- Sports performance analysis: to allow non-invasive performance analysis of player movement and team interaction in both training and live matches
- Veterinary science: to monitor animal behaviour for improved welfare and early detection of abnormalities
- Film, games and VR production: to support creativity through interpretation and editing of real-world content and to create digital doubles that appear and behave like a real person based on 4D audio-visual capture of actor performance
- Security and surveillance: to monitor individual human behaviour using visual biometrics based on 4D audio-visual analysis of the face and body in motion
- Robotics: to enable robots to safely navigate domestic environments with people and pets
- Human-computer communication: to understand individual behaviour for natural interaction
- Manufacturing industry: new methods for contactless measurement and analysis of surfaces using joint audio-visual sensing.
- Retail industry: methods to capture representations of both the audio and visual properties of objects to allow customers to experience these remotely.
- Cultural heritage: techniques to digitise artefacts for scientific study and to make them available to a wider audience.
Advances in these areas will be of direct benefit to the general public in healthcare, assisted living, entertainment and training.
Platform Grant funding will support underpinning research and pilot studies to build collaborative research in these domains.
Organisations
- University of Surrey (Lead Research Organisation)
- Anthropics Technology (Collaboration)
- SONY (Collaboration)
- Numerion Software Limited (Collaboration)
- Intel (United States) (Collaboration)
- BT Group (Collaboration)
- Boris FX (Collaboration)
- Snell & Wilcox Ltd (Collaboration)
- Bang & Olufsen (Collaboration)
- FIGMENT PRODUCTIONS LIMITED (Collaboration)
- Foundry (Collaboration)
- British Broadcasting Corporation (BBC) (Collaboration)
- Mirriad (Collaboration)
- Fraunhofer Society (Project Partner)
- Google (United States) (Project Partner)
- DoubleMe (Project Partner)
- Vicon (United Kingdom) (Project Partner)
- Bang & Olufsen (Denmark) (Project Partner)
- Audio Analytic (United Kingdom) (Project Partner)
- British Library (Project Partner)
- Cedar Audio Ltd (Project Partner)
- Foundry (United Kingdom) (Project Partner)
- Boris FX (United Kingdom) (Project Partner)
- Imaginarium (Project Partner)
- Supermassive Games (Project Partner)
- British Broadcasting Corporation (United Kingdom) (Project Partner)
- Sony (United Kingdom) (Project Partner)
Publications
Addari G
(2023)
A Family of Approaches for Full 3D Reconstruction of Objects with Complex Surface Reflectance
in International Journal of Computer Vision
Akash Srivastava
(2017)
VEEGAN: Reducing Mode Collapse in GANs using Implicit Variational Learning
Bailey M
(2022)
Finite Aperture Stereo
in International Journal of Computer Vision
Bailey M
(2021)
Finite Aperture Stereo: 3D Reconstruction of Macro-Scale Scenes
Bridgeman L
(2021)
Dynamic Appearance Modelling from Minimal Cameras
Chen J
(2021)
Channel and spatial attention based deep object co-segmentation
in Knowledge-Based Systems
Description | Introduction of technologies for audio-visual scene understanding, understanding of human movement and behaviour, autonomous navigation through scene understanding and audio-visual search. These technologies are exploiting advances in machine learning for scene understanding from huge databases of audio-visual data. Novel learning methodologies and architectures have been introduced to overcome the limitations of previous approaches and enable operation in complex real-world settings. Research in the first two years of the platform has seen a step change in the joint audio-visual machine learning for automatic understanding of dynamic scenes from video. Novel AI/machine learning methods and architectures have been introduced. Research has resulted in world-leading technologies for both audio and visual scene recognition demonstrated through awards in international competitions. |
Exploitation Route | Research advances and technologies are actively being exploited across a wide range of sectors through further collaboration. Google '8M YouTube video annotation challenge' Gold Medal - 7th out of 650 entries; Google 'Landmark retrieval challenge' 1st Place out of >150 entries; DCASE 'Audio Recognition Challenge' 1st Place |
Sectors | Aerospace Defence and Marine Agriculture Food and Drink Communities and Social Services/Policy Creative Economy Digital/Communication/Information Technologies (including Software) Education Electronics Environment Financial Services and Management Consultancy Healthcare Leisure Activities including Sports Recreation and Tourism Government Democracy and Justice Manufacturing including Industrial Biotechology Culture Heritage Museums and Collections Retail Security and Diplomacy |
URL | http://www.surrey.ac.uk/cvssp |
Description | The platform grant is supporting research with impact across a wide range of sectors including healthcare, autonomous vehicles, entertainment, securing, manufacture and communications. In the first two years of the platform there has been significant impact on both advances in methodologies for audio-visual machine learning demonstrating world-leading performance and the deployment of machine learning technologies for application in healthcare, security, entertainment, communication, robotics and autonomous systems. |
First Year Of Impact | 2018 |
Sector | Aerospace, Defence and Marine,Agriculture, Food and Drink,Communities and Social Services/Policy,Creative Economy,Digital/Communication/Information Technologies (including Software),Education,Electronics,Financial Services, and Management Consultancy,Healthcare,Leisure Activities, including Sports, Recreation and Tourism,Government, Democracy and Justice,Manufacturing, including Industrial Biotechology,Culture, Heritage, Museums and Collections,Retail,Security and Diplomacy,Transport |
Impact Types | Cultural Societal Economic Policy & public services |
Description | Ofcom Object-based Media Working Group |
Geographic Reach | National |
Policy Influence Type | Participation in a guidance/advisory committee |
Impact | Influence on media communication and service regulation |
URL | https://www.ofcom.org.uk |
Description | (N00014-16-R-FO05) Semantic Information Pursuit for Multimodal Data Analysis |
Amount | £980,875 (GBP) |
Funding ID | EP/R018456/1 |
Organisation | Engineering and Physical Sciences Research Council (EPSRC) |
Sector | Public |
Country | United Kingdom |
Start | 01/2018 |
End | 01/2023 |
Description | AI for Sound |
Amount | £2,120,276 (GBP) |
Funding ID | EP/T019751/1 |
Organisation | Engineering and Physical Sciences Research Council (EPSRC) |
Sector | Public |
Country | United Kingdom |
Start | 04/2020 |
End | 04/2025 |
Description | ARCHANGEL - Trusted Archives of Digital Public Records |
Amount | £487,428 (GBP) |
Funding ID | EP/P03151X/1 |
Organisation | Engineering and Physical Sciences Research Council (EPSRC) |
Sector | Public |
Country | United Kingdom |
Start | 05/2017 |
End | 06/2019 |
Description | BBC Prosperity Partnership: Future Personalised Object-based Media Experiences Delivered at Scale Anywhere |
Amount | £8,500,000 (GBP) |
Funding ID | EP/V038087/1 |
Organisation | Engineering and Physical Sciences Research Council (EPSRC) |
Sector | Public |
Country | United Kingdom |
Start | 06/2021 |
End | 06/2026 |
Description | EPSRC UK Acoustics Network Plus |
Amount | £1,418,894 (GBP) |
Funding ID | EP/V007866/1 |
Organisation | Engineering and Physical Sciences Research Council (EPSRC) |
Sector | Public |
Country | United Kingdom |
Start | 03/2021 |
End | 10/2025 |
Description | ExTOL: End to End Translation of British Sign Language |
Amount | £971,921 (GBP) |
Funding ID | EP/R03298X/1 |
Organisation | Engineering and Physical Sciences Research Council (EPSRC) |
Sector | Public |
Country | United Kingdom |
Start | 06/2018 |
End | 06/2022 |
Description | JADE: Joint Academic Data science Endeavour - 2 |
Amount | £6,739,771 (GBP) |
Funding ID | EP/T022205/1 |
Organisation | Engineering and Physical Sciences Research Council (EPSRC) |
Sector | Public |
Country | United Kingdom |
Start | 01/2020 |
End | 03/2025 |
Description | Multimodal Video Search by Examples (MVSE) |
Amount | £863,564 (GBP) |
Funding ID | EP/V002856/1 |
Organisation | Engineering and Physical Sciences Research Council (EPSRC) |
Sector | Public |
Country | United Kingdom |
Start | 03/2021 |
End | 09/2024 |
Description | Next Stage Digital Economy Centre in the Decentralised Digital Economy (DECaDE) |
Amount | £3,816,713 (GBP) |
Funding ID | EP/T022485/1 |
Organisation | Engineering and Physical Sciences Research Council (EPSRC) |
Sector | Public |
Country | United Kingdom |
Start | 09/2020 |
End | 09/2025 |
Description | Polymersive: Immersive Video Production Tools for Studio and Live Events |
Amount | £726,251 (GBP) |
Funding ID | 105168 |
Organisation | Innovate UK |
Sector | Public |
Country | United Kingdom |
Start | 03/2019 |
End | 09/2020 |
Description | RetinaScan: AI-enabled automated image assessment system for diabetic retinopathy screening |
Amount | £791,559 (GBP) |
Funding ID | 104184 |
Organisation | Innovate UK |
Sector | Public |
Country | United Kingdom |
Start | 03/2018 |
End | 08/2020 |
Description | TAPESTRY: Trust, Authentication and Privacy over a DeCentralised Social Registry |
Amount | £854,808 (GBP) |
Funding ID | EP/N02799X/1 |
Organisation | Engineering and Physical Sciences Research Council (EPSRC) |
Sector | Public |
Country | United Kingdom |
Start | 01/2017 |
End | 09/2020 |
Title | Audio-visual production dataset for weather forecaster personalisation use case |
Description | Audio and video recording of presenters for production resarch |
Type Of Material | Database/Collection of data |
Year Produced | 2023 |
Provided To Others? | Yes |
Impact | Use in production demonstrators January 2023 |
URL | http://cvssp.org |
Title | Finite Aperture Stereo Datasets |
Description | This landing page contains the datasets presented in the paper "Finite Aperture Stereo". The datasets are intended for defocus-based 3D reconstruction and analysis. Each download link contains images of a static scene, captured from multiple viewpoints and with different focus settings. The captured objects exhibit a range of reflectance properties and are physically small in scale. Calibration images are also available. |
Type Of Material | Database/Collection of data |
Year Produced | 2023 |
Provided To Others? | Yes |
Impact | This dataset enables benchmarking of 3D reconstruction algorithms. |
URL | https://openresearch.surrey.ac.uk/esploro/outputs/dataset/99661466602346 |
Title | Multi-View Labelling (MVL) Dataset |
Description | To overcome the shortage of real-world multi-view multiple people, we introduce a new synthetic multi-view multiple people labelling dataset named Multi-View 3D Humans (MV3DHumans). This dataset is a large-scale synthetic image dataset that was generated for multi-view multiple people detection, labelling and segmentation tasks. The MV3DHumans dataset contains 1200 scenes captured by multiple cameras, with 4, 6, 8 or 10 people in each scene. Each scene is captured by 16 cameras with overlapping field of views. The MV3DHumans dataset provides RGB images with resolution of 640 × 480. Ground truth annotations including bounding boxes, instance masks and multi-view correspondences, as well as camera calibrations are provided in the dataset. |
Type Of Material | Database/Collection of data |
Year Produced | 2023 |
Provided To Others? | Yes |
Impact | This enables training and benchmarking of algorithms multi-view multiple people detection, labelling and segmentation. |
URL | https://openresearch.surrey.ac.uk/esploro/outputs/dataset/99581923802346 |
Description | Anthropics Technology Ltd |
Organisation | Anthropics Technology |
Country | United Kingdom |
Sector | Private |
Start Year | 2005 |
Description | BBC Research and Development |
Organisation | British Broadcasting Corporation (BBC) |
Country | United Kingdom |
Sector | Public |
PI Contribution | Research in Computer Vision for broadcast production and Audio. Technologies for 3D production, free-view point video in sports, stereo production from monocular cameras, video annotation Member of the BBC Audio Research Partnership - developing the next generation of broadcast technology. |
Collaborator Contribution | In kind contribution (members of Steering/Advisory Boards) Use of the BBC lab and research/development facilities. Studentships (industrial case) funding and co-supervision of PhD students. |
Impact | Multi-disciplinary collaboration involves Computer Vision, Video Analysis, Psychoacoustics, Signal Processing and Spatial Audio |
Description | BT |
Organisation | BT Group |
Country | United Kingdom |
Sector | Private |
PI Contribution | research collaboration in object-based/personalised media delivery |
Collaborator Contribution | support of research activities |
Impact | ongoing collaboration |
Start Year | 2021 |
Description | Bang and Olufsen |
Organisation | Bang & Olufsen |
Country | Denmark |
Sector | Private |
PI Contribution | Spatial audio research (POSZ and S3A EPSRC funded projects) |
Collaborator Contribution | Scholarships (fees and bursaries) for EU/Home students. In-kind contribution by members of B&O Research department (Soren Bech, member of Steering /Advisory Boards and co-supervisor of funded students). Use of research facilities at their labs in Denmark |
Impact | Publications listed on http://iosr.uk/projects/POSZ/ Multi-disciplinary Collaboration: Signal Processing, Psychoacoustics and Spatial audio |
Description | British Broadcasting Corporation |
Organisation | British Broadcasting Corporation (BBC) |
Country | United Kingdom |
Sector | Public |
Start Year | 2004 |
Description | Figment Productions |
Organisation | Figment Productions Limited |
Country | United Kingdom |
Sector | Private |
PI Contribution | Advice/collaboration on personalisation in XR experiences |
Collaborator Contribution | research collaboration and attendance at industry meetings |
Impact | ongoing research |
Start Year | 2021 |
Description | Foundry |
Organisation | Foundry |
Country | United Kingdom |
Sector | Private |
PI Contribution | Collaboration on Computer Vision and AI tools for film and VFX production |
Collaborator Contribution | Novel computer vision methods for segmentation, reconstruction, tracking and representation of people from video. Used for actor performance capture and visual effects production. |
Impact | Novel Computer vision methods for video analysis of actor performance |
Start Year | 2008 |
Description | Foundry |
Organisation | Foundry |
Country | United Kingdom |
Sector | Private |
PI Contribution | Collaboration on Computer Vision and AI tools for film and VFX production |
Collaborator Contribution | Novel computer vision methods for segmentation, reconstruction, tracking and representation of people from video. Used for actor performance capture and visual effects production. |
Impact | Novel Computer vision methods for video analysis of actor performance |
Start Year | 2008 |
Description | Imagineer Systems |
Organisation | Boris FX |
Country | United Kingdom |
Sector | Private |
PI Contribution | Personalised media production tools |
Collaborator Contribution | Advice/collaboration on production tools |
Impact | ongoing collaboration |
Start Year | 2021 |
Description | Intel |
Organisation | Intel Corporation |
Country | United States |
Sector | Private |
PI Contribution | Research collaboration in personalised media |
Collaborator Contribution | collaboration/PhD Sponsorship |
Impact | ongoing collaboration |
Start Year | 2021 |
Description | Mirriad |
Organisation | Mirriad |
Country | United Kingdom |
Sector | Private |
PI Contribution | Personalisation in advertising advice |
Collaborator Contribution | Industry partner briefing on requirements for personalisation |
Impact | Ongoing collaboration |
Start Year | 2021 |
Description | Numerion |
Organisation | Numerion Software Limited |
Country | United Kingdom |
Sector | Private |
PI Contribution | Validation of physics based cloth simulation against real cloth measurements |
Collaborator Contribution | Physics based cloth simulation expertise |
Impact | Physically validated cloth simulation tools |
Start Year | 2014 |
Description | Snell & Wilcox Ltd |
Organisation | Snell & Wilcox Ltd |
Country | United Kingdom |
Sector | Private |
Start Year | 2006 |
Description | Sony Broadcast and Professional Europe |
Organisation | SONY |
Department | Sony Broadcast and Professional Europe |
Country | United Kingdom |
Sector | Private |
Start Year | 2004 |
Description | BBC Sounds Amazing |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Professional Practitioners |
Results and Impact | Industry/academic forum for research and production industry professionals in audio and sound hosted by the BBC |
Year(s) Of Engagement Activity | 2021,2022 |
URL | https://www.bbc.co.uk/academy/events/sounds-amazing-2022/ |
Description | Barbican AI more than Human Festival |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Public/other audiences |
Results and Impact | Barbican demonstration and talk as part of the AI: More than human festival. Live demonstration of AI technology for human movement capture. |
Year(s) Of Engagement Activity | 2018 |
URL | https://www.barbican.org.uk/whats-on/2019/event/ai-more-than-human |
Description | CVPR |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Industry/Business |
Results and Impact | Primary international forum for computer vision and AI/machine learning research in audio-visual media. Research dissemination through papers, key-note invited talks and workshop organisation |
Year(s) Of Engagement Activity | 2021,2022,2023 |
URL | https://cvpr2023.thecvf.com |
Description | CVSSP 30th Anniversary Celebration |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Industry/Business |
Results and Impact | Centre for Vision, Speech and Signal Processing (CVSSP) 30th Anniversary Celebration with over 500 participants from industry, government and alumni. The event themed on 'Can Machines Think' included a series of key-note talks from alumni who are international leaders in academia and industry, over 30 live demos of current research, and an open house at the centre for both industry and guests. There was also a VIP dinner hosted by the Vice-Chancellor of the University. |
Year(s) Of Engagement Activity | 2019 |
URL | http://surrey.ac.uk/cvssp |
Description | European Conference on Visual Media Production |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Professional Practitioners |
Results and Impact | Presentation of research advances at industry-academic forum |
Year(s) Of Engagement Activity | 2021,2022 |
URL | https://www.cvmp-conference.org/ |
Description | Invited talk to the Association of Noise Consultants, Aug 2020 |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Professional Practitioners |
Results and Impact | Invited talk on "Artificial Intelligence for Sound" to the Association of Noise Consultants, 27 Aug 2020 (Video meeting). |
Year(s) Of Engagement Activity | 2020 |
URL | https://www.linkedin.com/feed/update/urn:li:activity:6697179064635150337/ |
Description | Royal Society 'You and AI' |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Public/other audiences |
Results and Impact | Royal Society "You and AI" event hosted by Brian Cox held at the Barbican with an audience of 2000 and streamed via the Royal Society to National/International Audience. Live demonstration and discussion of AI technology for human motion capture from video. |
Year(s) Of Engagement Activity | 2018 |
URL | https://www.barbican.org.uk/whats-on/2018/event/you-and-ai-with-brian-cox |