Challenges in Immersive Audio Technology

Lead Research Organisation: King's College London

Department Name: Engineering

Abstract

Immersive technologies will transform not only how we communicate and experience entertainment, but also our experience of the physical world, from shops to museums, cars to classrooms. This transformation has been driven primarily by an unprecedented progress in visual technologies, which enable transporting users to an alternate visual reality. In the domain of audio, there are however long-standing fundamental challenges that need to be overcome to enable striking immersive experiences in which a group of listeners can just walk into a scene and feel transported to an alternate reality to enjoy a seamless shared experience without the need for headphones, head-tracking, personalisation or calibration.

The first key challenge is the delivery of immersive audio experiences to multiple listeners. Recent advances in audio technology are beginning to succeed in generating high quality immersive audio experiences. However, these are restricted in practice to individual listeners, with appropriate signals presented either via headphones, or via systems based on a modest number of loudspeakers using either cross-talk cancellation or beamforming. There remains a fundamental challenge in the technologically efficient delivery of "3D sound" to multiple listeners, either in small numbers (2-5) in a home environment, in museums, galleries and other public spaces (5-20) or in cinema and theatre auditoria (20-100). In principle, shared auditory experiences can be generated using physics-based methods such as wavefield synthesis or higher order ambisonics, but a sweet spot of even a modest size requires a prohibitive number of channels. CIAT aims to transform state of the art by developing a principled scalable and reconfigurable framework for capturing and reproducing only perceptually relevant information, thus leading to a step advance in the quality of immersive audio experiences achievable by practically viable systems.

The second key challenge is the real-time computation of environment acoustics needed to transport listeners to alternate reality, allowing them to interact with the environment and sound sources in it. This is pertinent to applications where immersive audio content is synthesised rather than recorded and to object-based audio in general. The sound field of an acoustic event consists of direct wavefront, followed by early and higher-order reflections. A convincing experience of being transported to the environment where the event takes place requires the rendering of these reflections, which cannot all be computed in real time. In applications where the sense of realism is critical, e.g. extended reality (XR) and to some extent gaming, impulse responses of the environment are typically computed only at several locations, with preset limits on the number reflections and directions of arrival, and then convolved with source sounds to achieve what is referred to as high-quality reverberation. Still, the computation of impulse responses and convolution may require GPU implementation and careful hands-on balancing between quality and complexity, and between CPU and GPU computation. CIAT aims to deliver a paradigm shift in environment modelling that will enable numerically efficient seamless high quality environment simulation in real time.

By addressing these challenges, CIAT will enable creation and delivery of shared interactive immersive audio experiences for emerging XR applications, whilst making a step advance in the quality of immersive audio in traditional media. In particular, efficient real-time synthesis of high quality environment acoustics is essential for both XR and object-based audio in general, including streaming and broadcasting. Delivery of 3D soundscapes to multiple listeners is a major unresolved problem in traditional applications too, including broadcasting, cinema, music events, and audio-visual installations.

Funded Value:

£953,617

Funded Period:

May 24 - May 27

Funder:

EPSRC

Project Status:

Active

Project Category:

Research Grant

Project Reference:

EP/X032981/1

Principal Investigator:

Zoran Cvetkovic

Research Subject:

Info. & commun. Technol. (100%)

Research Topic:

Vision & Senses - ICT appl. (100%)

Organisations

People	ORCID iD
Zoran Cvetkovic (Principal Investigator)

Publications

Author Name

Title Publication Date Published

10 25 50

Artistic and Creative Products
Further Funding
Collaboration
Engagement Activities


Title	12 Hours at Rainy Days Festival
Description	A marathon immersive sound performance for voice and electronics.
Type Of Art	Performance (Music, Dance, Drama, etc)
Year Produced	2024
Impact	The main impact is on creative practice in music composition enabled by the technologies developed on the project.
URL	https://whatsnew.composersedition.com/12hours-at-rainy-days-festival-interview/


Title	ReepsOne at DAFx24
Description	Immersive interactive live performance.
Type Of Art	Performance (Music, Dance, Drama, etc)
Year Produced	2024
Impact	The most notable impact is in the domain of artistic creativity enabled by the technologies developed on the project.
URL	https://dafx24.surrey.ac.uk/social-events/


Description	King's College London Impact Acceleration Award
Amount	£37,500 (GBP)
Organisation	King's College London
Sector	Academic/University
Country	United Kingdom
Start	03/2025
End	10/2025


Description	Institute of Sound Recording, University of Surrey
Organisation	University of Surrey
Country	United Kingdom
Sector	Academic/University
PI Contribution	Expertise, intellectual input.
Collaborator Contribution	Expertise, intellectual input.
Impact	Joint publications, grant proposal, and further development and deployment of the audio technology developed with the relevant EPSRC project in art projects and installations.
Start Year	2016


Description	METU
Organisation	Middle East Technical University
Department	Institute of Marine Sciences
Country	Turkey
Sector	Academic/University
PI Contribution	Expertise, intellectual.
Collaborator Contribution	Expertise, intellectual.
Impact	Joint publications. Development of the audio technology developed on the relevant EPSRC project and its deployment in art projects and installations.
Start Year	2012


Description	National Gallery X
Organisation	National Gallery, London
Country	United Kingdom
Sector	Charity/Non Profit
PI Contribution	My team has provided immersive audio technology and technical support.
Collaborator Contribution	The National Gallery has provided equipment and funding residencies for sound artists to develop and test creative responses to a gallery interpretative challenge.
Impact	Rain, Steam and Speed audio/visual performance. It is a multi-disciplinary collaboration involving music, visual arts, and audio-visual technologies.
Start Year	2019


Description	Stanford
Organisation	Stanford University
Country	United States
Sector	Academic/University
PI Contribution	Collaborative research.
Collaborator Contribution	Collaborative research.
Impact	A joint tutorial on multichannel surround systems, to be presented at ICASSP 2015. A joint paper to be submitted to AT&T Transactions on Audio, Speech, and Language Processing.
Start Year	2013


Description	ReepsOne performance at DAFx24
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Musician ReepsOne created an immersive audio performance using the technology which is being developed on the project. The performance took place at DAFx24, Digital Audio Effects Conference 2024.
Year(s) Of Engagement Activity	2024

Abstract

Organisations

People

ORCID iD

Publications