Challenges in Immersive Audio Technology
Lead Research Organisation:
King's College London
Department Name: Engineering
Abstract
Immersive technologies will transform not only how we communicate and experience entertainment, but also our experience of the physical world, from shops to museums, cars to classrooms. This transformation has been driven primarily by an unprecedented progress in visual technologies, which enable transporting users to an alternate visual reality. In the domain of audio, there are however long-standing fundamental challenges that need to be overcome to enable striking immersive experiences in which a group of listeners can just walk into a scene and feel transported to an alternate reality to enjoy a seamless shared experience without the need for headphones, head-tracking, personalisation or calibration.
The first key challenge is the delivery of immersive audio experiences to multiple listeners. Recent advances in audio technology are beginning to succeed in generating high quality immersive audio experiences. However, these are restricted in practice to individual listeners, with appropriate signals presented either via headphones, or via systems based on a modest number of loudspeakers using either cross-talk cancellation or beamforming. There remains a fundamental challenge in the technologically efficient delivery of "3D sound" to multiple listeners, either in small numbers (2-5) in a home environment, in museums, galleries and other public spaces (5-20) or in cinema and theatre auditoria (20-100). In principle, shared auditory experiences can be generated using physics-based methods such as wavefield synthesis or higher order ambisonics, but a sweet spot of even a modest size requires a prohibitive number of channels. CIAT aims to transform state of the art by developing a principled scalable and reconfigurable framework for capturing and reproducing only perceptually relevant information, thus leading to a step advance in the quality of immersive audio experiences achievable by practically viable systems.
The second key challenge is the real-time computation of environment acoustics needed to transport listeners to alternate reality, allowing them to interact with the environment and sound sources in it. This is pertinent to applications where immersive audio content is synthesised rather than recorded and to object-based audio in general. The sound field of an acoustic event consists of direct wavefront, followed by early and higher-order reflections. A convincing experience of being transported to the environment where the event takes place requires the rendering of these reflections, which cannot all be computed in real time. In applications where the sense of realism is critical, e.g. extended reality (XR) and to some extent gaming, impulse responses of the environment are typically computed only at several locations, with preset limits on the number reflections and directions of arrival, and then convolved with source sounds to achieve what is referred to as high-quality reverberation. Still, the computation of impulse responses and convolution may require GPU implementation and careful hands-on balancing between quality and complexity, and between CPU and GPU computation. CIAT aims to deliver a paradigm shift in environment modelling that will enable numerically efficient seamless high quality environment simulation in real time.
By addressing these challenges, CIAT will enable creation and delivery of shared interactive immersive audio experiences for emerging XR applications, whilst making a step advance in the quality of immersive audio in traditional media. In particular, efficient real-time synthesis of high quality environment acoustics is essential for both XR and object-based audio in general, including streaming and broadcasting. Delivery of 3D soundscapes to multiple listeners is a major unresolved problem in traditional applications too, including broadcasting, cinema, music events, and audio-visual installations.
The first key challenge is the delivery of immersive audio experiences to multiple listeners. Recent advances in audio technology are beginning to succeed in generating high quality immersive audio experiences. However, these are restricted in practice to individual listeners, with appropriate signals presented either via headphones, or via systems based on a modest number of loudspeakers using either cross-talk cancellation or beamforming. There remains a fundamental challenge in the technologically efficient delivery of "3D sound" to multiple listeners, either in small numbers (2-5) in a home environment, in museums, galleries and other public spaces (5-20) or in cinema and theatre auditoria (20-100). In principle, shared auditory experiences can be generated using physics-based methods such as wavefield synthesis or higher order ambisonics, but a sweet spot of even a modest size requires a prohibitive number of channels. CIAT aims to transform state of the art by developing a principled scalable and reconfigurable framework for capturing and reproducing only perceptually relevant information, thus leading to a step advance in the quality of immersive audio experiences achievable by practically viable systems.
The second key challenge is the real-time computation of environment acoustics needed to transport listeners to alternate reality, allowing them to interact with the environment and sound sources in it. This is pertinent to applications where immersive audio content is synthesised rather than recorded and to object-based audio in general. The sound field of an acoustic event consists of direct wavefront, followed by early and higher-order reflections. A convincing experience of being transported to the environment where the event takes place requires the rendering of these reflections, which cannot all be computed in real time. In applications where the sense of realism is critical, e.g. extended reality (XR) and to some extent gaming, impulse responses of the environment are typically computed only at several locations, with preset limits on the number reflections and directions of arrival, and then convolved with source sounds to achieve what is referred to as high-quality reverberation. Still, the computation of impulse responses and convolution may require GPU implementation and careful hands-on balancing between quality and complexity, and between CPU and GPU computation. CIAT aims to deliver a paradigm shift in environment modelling that will enable numerically efficient seamless high quality environment simulation in real time.
By addressing these challenges, CIAT will enable creation and delivery of shared interactive immersive audio experiences for emerging XR applications, whilst making a step advance in the quality of immersive audio in traditional media. In particular, efficient real-time synthesis of high quality environment acoustics is essential for both XR and object-based audio in general, including streaming and broadcasting. Delivery of 3D soundscapes to multiple listeners is a major unresolved problem in traditional applications too, including broadcasting, cinema, music events, and audio-visual installations.
Organisations
- King's College London (Lead Research Organisation)
- National Gallery, London (Collaboration)
- University of Surrey (Collaboration)
- Stanford University (Collaboration, Project Partner)
- Middle East Technical University (Collaboration)
- International Broadcasting Convention (Project Partner)
- Sonos (Project Partner)
- Playlines (Project Partner)
- Real World Studios (Project Partner)
- MagicBeans (Project Partner)
- National Gallery (Project Partner)
- Kajima Technical Research Institute (Project Partner)
- British Broadcasting Corporation - BBC (Project Partner)
People |
ORCID iD |
| Zoran Cvetkovic (Principal Investigator) |
| Title | 12 Hours at Rainy Days Festival |
| Description | A marathon immersive sound performance for voice and electronics. |
| Type Of Art | Performance (Music, Dance, Drama, etc) |
| Year Produced | 2024 |
| Impact | The main impact is on creative practice in music composition enabled by the technologies developed on the project. |
| URL | https://whatsnew.composersedition.com/12hours-at-rainy-days-festival-interview/ |
| Title | ReepsOne at DAFx24 |
| Description | Immersive interactive live performance. |
| Type Of Art | Performance (Music, Dance, Drama, etc) |
| Year Produced | 2024 |
| Impact | The most notable impact is in the domain of artistic creativity enabled by the technologies developed on the project. |
| URL | https://dafx24.surrey.ac.uk/social-events/ |
| Description | King's College London Impact Acceleration Award |
| Amount | £37,500 (GBP) |
| Organisation | King's College London |
| Sector | Academic/University |
| Country | United Kingdom |
| Start | 03/2025 |
| End | 10/2025 |
| Description | Institute of Sound Recording, University of Surrey |
| Organisation | University of Surrey |
| Country | United Kingdom |
| Sector | Academic/University |
| PI Contribution | Expertise, intellectual input. |
| Collaborator Contribution | Expertise, intellectual input. |
| Impact | Joint publications, grant proposal, and further development and deployment of the audio technology developed with the relevant EPSRC project in art projects and installations. |
| Start Year | 2016 |
| Description | METU |
| Organisation | Middle East Technical University |
| Department | Institute of Marine Sciences |
| Country | Turkey |
| Sector | Academic/University |
| PI Contribution | Expertise, intellectual. |
| Collaborator Contribution | Expertise, intellectual. |
| Impact | Joint publications. Development of the audio technology developed on the relevant EPSRC project and its deployment in art projects and installations. |
| Start Year | 2012 |
| Description | National Gallery X |
| Organisation | National Gallery, London |
| Country | United Kingdom |
| Sector | Charity/Non Profit |
| PI Contribution | My team has provided immersive audio technology and technical support. |
| Collaborator Contribution | The National Gallery has provided equipment and funding residencies for sound artists to develop and test creative responses to a gallery interpretative challenge. |
| Impact | Rain, Steam and Speed audio/visual performance. It is a multi-disciplinary collaboration involving music, visual arts, and audio-visual technologies. |
| Start Year | 2019 |
| Description | Stanford |
| Organisation | Stanford University |
| Country | United States |
| Sector | Academic/University |
| PI Contribution | Collaborative research. |
| Collaborator Contribution | Collaborative research. |
| Impact | A joint tutorial on multichannel surround systems, to be presented at ICASSP 2015. A joint paper to be submitted to AT&T Transactions on Audio, Speech, and Language Processing. |
| Start Year | 2013 |
| Description | ReepsOne performance at DAFx24 |
| Form Of Engagement Activity | Participation in an activity, workshop or similar |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Professional Practitioners |
| Results and Impact | Musician ReepsOne created an immersive audio performance using the technology which is being developed on the project. The performance took place at DAFx24, Digital Audio Effects Conference 2024. |
| Year(s) Of Engagement Activity | 2024 |