Scalable Room Acoustic Modelling (SCReAM)

Lead Research Organisation: University of Surrey

Department Name: Sound Recording

Abstract

We spend the majority of our lives indoors. Within enclosed spaces, sound is reflected numerous times, leading to reverberation. We are accustomed to perceiving reverberation-we unconsciously use it to navigate the space, and, when absent, we notice. Similarly, our electronic devices, such as laptops, TVs or smart home devices, are exposed to reverberation and need to take into account its presence. Being able to predict, synthesise, and control reverberation is therefore important. This is done using room acoustic models.

Existing room acoustic models suffer from two main limitations. First, they were originally developed from very different starting points and for very different purposes, which has led to a highly fragmented research field where advancements in one area do not translate to advancements in other areas, slowing down research. Second, each model has a specific accuracy and a specific computational complexity, with some very accurate models taking several days to run (physical models), while others run in real-time but with low accuracy and only aim to create a pleasing reverberant sound (perceptual models). Thus, there is no single model that allows to scale continuously from one extreme to the other.

This project will overcome both limitations by defining a novel, unifying room acoustic model that combines appealing properties of all main types of models and that can scale on demand from a lightweight perceptual model to a full-scale physical model. Such a SCalable Room Acoustic Model (SCReAM) will bring benefits in many applications, ranging from consumer electronics and communications, to computer games, immersive media, and architectural acoustics. The model will be able to adapt in real time, enabling end-users to get the best possible auditory experience allowed by the available computing resources. Audio software developers will not need to update their development chains once more powerful machines become available, thus reducing costs. Electronic equipment, such as hands-free devices, smart loudspeakers, and sound reinforcement systems, will be able to build a more flexible internal representation of room acoustics, allowing them to reduce unwanted echoes, to remove acoustic feedback, and/or to improve the tonal balance of reproduced sound.

The main hypothesis of the project is that a connection exists between physical models and perceptual models based on so-called delay networks, and that this connection can be leveraged to develop the sought-after unifying and scalable model.

The research will be conducted at the University of Surrey with industrial support by Sonos (audio consumer electronics), Electronic Arts (computer games), Audio Software Development Limited (computer games audio consultancy), and Adrian James Acoustics (acoustics consultancy).

Planned Impact

In addition to its scientific impact, SCReAM will have a significant industrial, cultural, and societal impact.

Industrial impact:

The project will bring benefits to some of the most vibrant and successful industrial sectors of the UK economy, including the creative industries, the digital sector, and the cultural sector, which together contribute £165bn of gross value added (GVA) and represent 7.3% of the total UK workforce. More specifically, SCReAM will have an impact on the following industrial sectors (between parentheses are the UK Standard Industrial Classification codes):

(a) Architectural acoustics (M71), where room acoustic models are used to advise civil/sound engineers on how to improve the acoustics of performing spaces (e.g. theatres), increase speech intelligibility in learning spaces (e.g. classrooms) or public spaces (e.g. train stations, stadiums), and abate noise in the vicinity of motorways/railways.

(b) Consumer electronics (C26) including but not limited to (i) hands-free devices, such as smart TVs/loudspeakers, where room acoustic models are used to reduce disturbing echoes, thus improving the performance of automatic speech recognition algorithms, (ii) sound reinforcement systems, where they are used to control acoustic feedback, (iii) self-calibrating loudspeakers, where they are used to equalise the room response and correct tonal imbalances.

(c) Telecommunications using hands-free devices (J61), where room acoustic models can be used to enhance the quality and intelligibility of a far speaker by reducing feedback and/or echoes.

(d) Immersive media (J62), including but not limited to (i) video games and virtual reality, where room acoustic models are used to render reverberation and make the listener feel present within the virtual space, (ii) augmented reality, where they are used to make virtual sound sources feel consistent with the acoustics of the real space.

(e) TV and radio broadcasting; sound recording in film, video and music; live performing in theatre and music concerts; museums, galleries, and performing arts (J59, J60, R90), where room acoustic models are used to artistically enhance audio material.

Cultural impact:

SCReAM will contribute to the cultural sector by allowing artists, venues, and cultural institutions to deliver more accurate and believable audio experiences in virtual reality, augmented reality and surround audio installations, as well as in live performances.

Societal impact:

SCReAM will also have a wider societal impact. Perceptually convincing models for immersive communications will yield a more seamless teleworking experience, which, in turn, benefits the environment. Improved ASR performance through echo cancellation, will contribute to making the digital infrastructure simple, accessible, and invisible. More flexible, accurate, and computationally efficient room acoustic models mean more reliable noise abatement studies-improving sleeping patterns-and better control of intelligibility in public spaces and learning spaces-improving safety and literacy skills, respectively.

General public:

The ultimate beneficiary of SCReAM is the general public. For instance, in computer games, VR/AR, broadcasting, performing arts, and music production, end-users will benefit from the best possible auditory experience allowed by the available computing resources. In the applications related to consumer electronics and telecommunications, they will benefit from better ASR performance, improved intelligibility of far talkers, from more tonally balanced sound reproduction, and/or from reduced acoustic feedback.

Funded Value:

£407,334

Funded Period:

Jul 21 - Jan 25

Funder:

EPSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

EP/V002554/1

Principal Investigator:

Enzo De Sena

Research Subject:

Info. & commun. Technol. (100%)

Research Topic:

Music & Acoustic Technology (100%)

Organisations

People	ORCID iD
Enzo De Sena (Principal Investigator)	http://orcid.org/0000-0002-8007-4370

Publications

Author Name

Title Publication Date Published

|< < 1 2 > >|

10 25 50

Atalay T (2022) Scattering Delay Network Simulator of Coupled Volume Acoustics in IEEE/ACM Transactions on Audio, Speech, and Language Processing

Burnett B (2023) User Expectation of Room Acoustic Parameters in Virtual Reality Environments

Das O (2023) The Complex Image Method for Simulating Wave Scattering in Room Acoustics

Das O (2023) Grouped Feedback Delay Networks With Frequency-Dependent Coupling in IEEE/ACM Transactions on Audio, Speech, and Language Processing

Mannall J (2022) Perceptual evaluation of low-complexity diffraction models from a single edge

Mannall J (2023) Efficient Diffraction Modeling Using Neural Networks and Infinite Impulse Response Filters in Journal of the Audio Engineering Society

Mannall J. (2022) Perceptual evaluation of low-complexity diffraction models from a single edge in Proceedings of the AES International Conference

Potter T (2022) On the Relative Importance of Visual and Spatial Audio Rendering on VR Immersion in Frontiers in Signal Processing

R Ali (2023) Relating wave-based and geometric acoustics using a stationary phase approximation

Scerbo M (2024) Room Acoustic Rendering Networks With Control of Scattering and Early Reflections in IEEE/ACM Transactions on Audio, Speech, and Language Processing

Key Findings
Further Funding


Description	Room acoustic models are at the core of a number of spatial audio applications, including audio rendering for computer gaming/VR/AR/metaverse, architectural acoustics, and remote communications. The overarching aim of SCReAM has been to find connections between different classes of room acoustic models and develop new ones that can adapt to the varied requirements of all these applications. SCReAM has already made notable strides towards that core objective. Key scientific advancements include: (a) developed scalable artificial reverberators that can model environment geometries not previously possible and achieve high accuracy for a given computational budget, (b) developed low-complexity diffraction models at the intersection of classical signal processing and deep learning, (c) derived explicit links between geometric and wave-based acoustic models, (d) quantified user expectations and listeners' responses to room acoustic models, (e) developed fundamental signal processing techniques underpinning artificial reverberators. Collectively, these efforts have advanced our understanding and capabilities in room acoustic modelling, leading to improvements that will eventually benefit users of all applications mentioned above. The project already resulted in 12 international peer-reviewed publications and contributed to opening new collaborations with other academic institutions (Aalto, KU Leuven, Politecnico di Milano) and industry (including further funding from Reality Labs at Meta, L-Acoustics).
Exploitation Route	The outcomes of SCReAM will ultimately lead to more seamless remote communications, more immersive experiences in VR/AR/computer games/metaverse, and more accurate predictions for architectural acoustics, among others. The evidence collected in terms of how important room acoustic rendering is to achieve immersive VR experiences will encourage further R&D investment in audio rendering. The progress made in bridging different classes of room acoustic models will foster further research and cross-fertilisation in the respective fields.
Sectors	Creative Economy Digital/Communication/Information Technologies (including Software)


Description	Data-driven Room Acoustic Modeling for AR (DRAMA)
Amount	£159,000 (GBP)
Organisation	Facebook
Sector	Private
Country	United States
Start	08/2022
End	09/2026

Abstract

Planned Impact

Organisations

People

ORCID iD

Publications