Challenges in Immersive Audio Technology

Lead Research Organisation: King's College London
Department Name: Engineering

Abstract

Immersive technologies will transform not only how we communicate and experience entertainment, but also our experience of the physical world, from shops to museums, cars to classrooms. This transformation has been driven primarily by an unprecedented progress in visual technologies, which enable transporting users to an alternate visual reality. In the domain of audio, there are however long-standing fundamental challenges that need to be overcome to enable striking immersive experiences in which a group of listeners can just walk into a scene and feel transported to an alternate reality to enjoy a seamless shared experience without the need for headphones, head-tracking, personalisation or calibration.

The first key challenge is the delivery of immersive audio experiences to multiple listeners. Recent advances in audio technology are beginning to succeed in generating high quality immersive audio experiences. However, these are restricted in practice to individual listeners, with appropriate signals presented either via headphones, or via systems based on a modest number of loudspeakers using either cross-talk cancellation or beamforming. There remains a fundamental challenge in the technologically efficient delivery of "3D sound" to multiple listeners, either in small numbers (2-5) in a home environment, in museums, galleries and other public spaces (5-20) or in cinema and theatre auditoria (20-100). In principle, shared auditory experiences can be generated using physics-based methods such as wavefield synthesis or higher order ambisonics, but a sweet spot of even a modest size requires a prohibitive number of channels. CIAT aims to transform state of the art by developing a principled scalable and reconfigurable framework for capturing and reproducing only perceptually relevant information, thus leading to a step advance in the quality of immersive audio experiences achievable by practically viable systems.

The second key challenge is the real-time computation of environment acoustics needed to transport listeners to alternate reality, allowing them to interact with the environment and sound sources in it. This is pertinent to applications where immersive audio content is synthesised rather than recorded and to object-based audio in general. The sound field of an acoustic event consists of direct wavefront, followed by early and higher-order reflections. A convincing experience of being transported to the environment where the event takes place requires the rendering of these reflections, which cannot all be computed in real time. In applications where the sense of realism is critical, e.g. extended reality (XR) and to some extent gaming, impulse responses of the environment are typically computed only at several locations, with preset limits on the number reflections and directions of arrival, and then convolved with source sounds to achieve what is referred to as high-quality reverberation. Still, the computation of impulse responses and convolution may require GPU implementation and careful hands-on balancing between quality and complexity, and between CPU and GPU computation. CIAT aims to deliver a paradigm shift in environment modelling that will enable numerically efficient seamless high quality environment simulation in real time.

By addressing these challenges, CIAT will enable creation and delivery of shared interactive immersive audio experiences for emerging XR applications, whilst making a step advance in the quality of immersive audio in traditional media. In particular, efficient real-time synthesis of high quality environment acoustics is essential for both XR and object-based audio in general, including streaming and broadcasting. Delivery of 3D soundscapes to multiple listeners is a major unresolved problem in traditional applications too, including broadcasting, cinema, music events, and audio-visual installations.

Publications

10 25 50