Scalable Room Acoustic Modelling (SCReAM)

Lead Research Organisation: University of Surrey
Department Name: Sound Recording

Abstract

We spend the majority of our lives indoors. Within enclosed spaces, sound is reflected numerous times, leading to reverberation. We are accustomed to perceiving reverberation-we unconsciously use it to navigate the space, and, when absent, we notice. Similarly, our electronic devices, such as laptops, TVs or smart home devices, are exposed to reverberation and need to take into account its presence. Being able to predict, synthesise, and control reverberation is therefore important. This is done using room acoustic models.

Existing room acoustic models suffer from two main limitations. First, they were originally developed from very different starting points and for very different purposes, which has led to a highly fragmented research field where advancements in one area do not translate to advancements in other areas, slowing down research. Second, each model has a specific accuracy and a specific computational complexity, with some very accurate models taking several days to run (physical models), while others run in real-time but with low accuracy and only aim to create a pleasing reverberant sound (perceptual models). Thus, there is no single model that allows to scale continuously from one extreme to the other.

This project will overcome both limitations by defining a novel, unifying room acoustic model that combines appealing properties of all main types of models and that can scale on demand from a lightweight perceptual model to a full-scale physical model. Such a SCalable Room Acoustic Model (SCReAM) will bring benefits in many applications, ranging from consumer electronics and communications, to computer games, immersive media, and architectural acoustics. The model will be able to adapt in real time, enabling end-users to get the best possible auditory experience allowed by the available computing resources. Audio software developers will not need to update their development chains once more powerful machines become available, thus reducing costs. Electronic equipment, such as hands-free devices, smart loudspeakers, and sound reinforcement systems, will be able to build a more flexible internal representation of room acoustics, allowing them to reduce unwanted echoes, to remove acoustic feedback, and/or to improve the tonal balance of reproduced sound.

The main hypothesis of the project is that a connection exists between physical models and perceptual models based on so-called delay networks, and that this connection can be leveraged to develop the sought-after unifying and scalable model.

The research will be conducted at the University of Surrey with industrial support by Sonos (audio consumer electronics), Electronic Arts (computer games), Audio Software Development Limited (computer games audio consultancy), and Adrian James Acoustics (acoustics consultancy).

Planned Impact

In addition to its scientific impact, SCReAM will have a significant industrial, cultural, and societal impact.


Industrial impact:

The project will bring benefits to some of the most vibrant and successful industrial sectors of the UK economy, including the creative industries, the digital sector, and the cultural sector, which together contribute £165bn of gross value added (GVA) and represent 7.3% of the total UK workforce. More specifically, SCReAM will have an impact on the following industrial sectors (between parentheses are the UK Standard Industrial Classification codes):

(a) Architectural acoustics (M71), where room acoustic models are used to advise civil/sound engineers on how to improve the acoustics of performing spaces (e.g. theatres), increase speech intelligibility in learning spaces (e.g. classrooms) or public spaces (e.g. train stations, stadiums), and abate noise in the vicinity of motorways/railways.

(b) Consumer electronics (C26) including but not limited to (i) hands-free devices, such as smart TVs/loudspeakers, where room acoustic models are used to reduce disturbing echoes, thus improving the performance of automatic speech recognition algorithms, (ii) sound reinforcement systems, where they are used to control acoustic feedback, (iii) self-calibrating loudspeakers, where they are used to equalise the room response and correct tonal imbalances.

(c) Telecommunications using hands-free devices (J61), where room acoustic models can be used to enhance the quality and intelligibility of a far speaker by reducing feedback and/or echoes.

(d) Immersive media (J62), including but not limited to (i) video games and virtual reality, where room acoustic models are used to render reverberation and make the listener feel present within the virtual space, (ii) augmented reality, where they are used to make virtual sound sources feel consistent with the acoustics of the real space.

(e) TV and radio broadcasting; sound recording in film, video and music; live performing in theatre and music concerts; museums, galleries, and performing arts (J59, J60, R90), where room acoustic models are used to artistically enhance audio material.


Cultural impact:

SCReAM will contribute to the cultural sector by allowing artists, venues, and cultural institutions to deliver more accurate and believable audio experiences in virtual reality, augmented reality and surround audio installations, as well as in live performances.


Societal impact:

SCReAM will also have a wider societal impact. Perceptually convincing models for immersive communications will yield a more seamless teleworking experience, which, in turn, benefits the environment. Improved ASR performance through echo cancellation, will contribute to making the digital infrastructure simple, accessible, and invisible. More flexible, accurate, and computationally efficient room acoustic models mean more reliable noise abatement studies-improving sleeping patterns-and better control of intelligibility in public spaces and learning spaces-improving safety and literacy skills, respectively.


General public:

The ultimate beneficiary of SCReAM is the general public. For instance, in computer games, VR/AR, broadcasting, performing arts, and music production, end-users will benefit from the best possible auditory experience allowed by the available computing resources. In the applications related to consumer electronics and telecommunications, they will benefit from better ASR performance, improved intelligibility of far talkers, from more tonally balanced sound reproduction, and/or from reduced acoustic feedback.

Publications

10 25 50
publication icon
Atalay T (2022) Scattering Delay Network Simulator of Coupled Volume Acoustics in IEEE/ACM Transactions on Audio, Speech, and Language Processing

publication icon
Das O (2023) Grouped Feedback Delay Networks With Frequency-Dependent Coupling in IEEE/ACM Transactions on Audio, Speech, and Language Processing

publication icon
Mannall J (2023) Efficient Diffraction Modeling Using Neural Networks and Infinite Impulse Response Filters in Journal of the Audio Engineering Society

publication icon
Mannall J. (2022) Perceptual evaluation of low-complexity diffraction models from a single edge in Proceedings of the AES International Conference

publication icon
Scerbo M. (2022) HIGHER-ORDER SCATTERING DELAY NETWORKS FOR ARTIFICIAL REVERBERATION in Proceedings of the International Conference on Digital Audio Effects, DAFx

publication icon
Scerbo, M. (2022) HIGHER-ORDER SCATTERING DELAY NETWORKS FOR ARTIFICIAL REVERBERATION in Proceedings of the International Conference on Digital Audio Effects, DAFx

 
Description Data-driven Room Acoustic Modeling for AR (DRAMA)
Amount £159,000 (GBP)
Organisation Facebook 
Sector Private
Country United States
Start 09/2022 
End 09/2026