Perceptual Sound Field Reconstruction and Coherent Emulation

Lead Research Organisation: King's College London
Department Name: Engineering

Abstract

The project is concerned with the development of a new 5--10 channel audio technology which would improve over existing ones in terms of (a) realism, (b) accuracy and stability of the auditory perspective, (c) size of the sweet spot, and (d) the envelopment experience. Since the new technology aims to create a 360 degrees auditory perspective, the reproduction will take place over speakers positioned at vertices of a regular polygon. Each speaker will consist of two components, one which will radiate the direct sound field toward a listener, and another which will reproduce diffuse sound field by introducing additional scattering. The goal of the particular tasks, listed below, is to find optimal ways to capture sound field cues and render them using the proposed playback system in a manner which would provide the most convincing illusion of the original or desired sound field.(i) Optimal microphone arrays for the proposed play-back system will be investigated. Arrays considered will consist of microphones placed in the horizontal plane at the vertices of a regular polygon, with the number of microphones equal to the number of speakers. For each array, different diameters, in the range from near coincident up to somewhat beyond the optimal value, and different microphone directivity patterns will be considered. These studies will be repreated for a few diameters of the speaker configuration to investigate if the optimal array diameter depends on the size of the speaker lay-out, and if so to characterize that dependence. Possible dependencies between the optimal microphone directivity patterns and array diameters will be also investigated and characterized. Arrays will be evaluated in critical listening tests according to criteria (a)--(d) stated in the above. Experiments will be guided by simulations which would provide initial objective assessment of ITD and ILD cues generated within the listening area. In parallel, mathematical models of sound fields generated by the proposed technology will be investigated, which could provide some additional insight into the optimal microphone array design. (ii) The impact of play-back with cross-talk cancellation will be be systematically investigated. Existing cross-talk cancellation algorithms will be first used, and if necessary, new algorithms which are numerically efficient and effective in a range of listening environments will be developed. Then optimal microphone arrays for play back with cross-talk cancellation will be investigated, i.e. the work described under (i) will be repeated for reproduction with cross-talk cancellation. Finally, optimal systems with and without cross-talk cancellation will be compared.(iii) Algorithms for direct/diffuse sound field separation will be studied. When the number of instruments does not exceed the number of microphones, multichannel equalization techniques can be used to find dry source signals, which can then be convolved with direct/reverberant parts of room impulse responses to obtain direct/diffuse sound field components, respectively. Multichannel equalization in audio is, however, particularly challenging owing to excessively long impulse responses, and we will develop numerically efficient algorithms for multichannel equalization for audio applications. Then we will study psychoacoustic approximation to direct/diffuse sound field decomposition with no restriction on the number of sources. (iv) Combinations of near-coincident directional microphone arrays, for acquiring direct sound field cues, and widely spaced arrays based on omni-directional or bi-directional microphones, for acquiring diffuse sound field cues, will be systematically investigated in critical listening tests according to criteria (a)--(d). This approach will be evaluated in comparison with the approach described in (i)--(iii) where the same array is used for both sound field components.

Publications

10 25 50
publication icon
De Sena E (2015) Efficient Synthesis of Room Acoustics via Scattering Delay Networks in IEEE/ACM Transactions on Audio, Speech, and Language Processing

publication icon
De Sena E (2012) On the Design and Implementation of Higher Order Differential Microphones in IEEE Transactions on Audio, Speech, and Language Processing

publication icon
De Sena E (2013) Analysis and Design of Multichannel Systems for Perceptual Sound Field Reconstruction in IEEE Transactions on Audio, Speech, and Language Processing

publication icon
De Sena Enzo (2011) A generalized design method for directivity patterns of spherical microphone arrays in IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011

publication icon
De Sena, E. (2013) A computational model for the estimation of localisation uncertainty in IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013

publication icon
Enzo De Sena (Author) (2010) Perceptual evaluation of a circularly symmetric microphone array for panoramic recording of audio, in International Symposium on Ambisonics and Spherical Acoustics

publication icon
Enzo De Sena (Author) (2011) Scattering Delay Network: an interactive reverberator for computer games in AES 41st International Conference: Audio for Games

publication icon
Hacihabibog?lu H (2010) Allpass variable fractional delay filters by pole loci interpolation in Electronics Letters

 
Title Ouroboros 
Description An immersive 3D audio-visual installation 
Type Of Art Performance (Music, Dance, Drama, etc) 
Year Produced 2017 
Impact Demonstration and first public display of the audio technology created on the projects funded by the associated awards. 
URL http://pantar.com/portfolio/ouroboros/
 
Description There are several key findings of the project:

1) The first scientific and systematic framework for the design of multichannel audio systems based on perceptual criteria.

2) A particular class of multichannel systems for recording and spatially convincing reproduction of acoustic performances.

3) A method for modelling the perception of locatendess of phantom sources created by multichannel systems.

4) A new class of high practical high order differential microphone.

5) A method of super-real-time rendition of perceptually convincing reverberation of acoustic spaces.

6) It has been demonstrated that non-coincident microphone arrays are capable of capturing song field cues needed for its spatially stable reconstruction.
Exploitation Route Outcomes of this research are directly applicable to a wide range of multichannel audio technologies, from performance recording for record labels, sound production for film, and broadcasting, through gaming, virtual and augmented reality, to acoustics simulation in architectural design and microphone manufacture. This research lays down scientific foundations for research in the direction of perceptual sound field reconstruction using low-count multichannel systems. It further provides a new class of microphones which opens up possibilities for new developments in sound recording technologies.
Sectors Creative Economy,Digital/Communication/Information Technologies (including Software),Education,Electronics,Healthcare,Culture, Heritage, Museums and Collections

URL https://www.kcl.ac.uk/Cultural/-/Projects/Soundscapes.aspx
 
Description The project was of the fundamental research nature, so it hasn't produced non-academic impact yet, but there is a high potential for impact outside of academe. It was was concerned with multichannel systems for perceptual sound field synthesis and reproduction. The field of spatial sound has so far been mainly geared towards creating special effects and providing a pleasing listening experience, rather than rooted in solid engineering or science. We established a scientific framework for the analysis and design of multichannel systems based on concise modelling of underlying psychoacoustic phenomena. That framework enabled the development of a new multichannel audio technology which improves over state-of-the-art systems in terms of accuracy and stability of the auditory perspective. We also developed a super-real-time software implementation for virtual reality applications, based on further psychoacoustic approximation, as well as a new class of underlying microphones. The initial motivation for this work was the recording of music performances that allow for convincing spatial reproduction and broadcasting. It turns out, however, that a much larger market for our technology lies in virtual reality applications, including gaming, as well as augmented reality as pursued by Google. Contemporary composers, too, are frequently attempting to place their sounds within a specific auditory landscape. Archaeologists, anthropologists, and art historians are trying to recreate the acoustics of important historical venues. We will therefore endeavour to engage in multidisciplinary collaborations involving ICT (i.e. spatial sound) and music, and the humanities, as well as reaching out to the rich cultural and entertainment milieu of London and engage with institutions like the Royal Opera House, Royal Festival Hall, and Tate Modern in joint projects involving sound recording, music, and multimedia performances and installations. While commercial impact hasn't materialised yet, several companies have shown interest in our intellectual property arising from this project.
First Year Of Impact 2017
Sector Creative Economy
Impact Types Cultural

 
Description Cultural Institute Award
Amount £25,800 (GBP)
Organisation King's College London 
Sector Academic/University
Country United Kingdom
Start 06/2016 
End 06/2017
 
Description Cultural Institute Award
Amount £8,708 (GBP)
Organisation King's College London 
Sector Academic/University
Country United Kingdom
Start 02/2018 
End 06/2018
 
Description Impact Acceleration Award
Amount £6,000 (GBP)
Organisation King's College London 
Sector Academic/University
Country United Kingdom
Start 11/2015 
End 06/2016
 
Description Impact Acceleration Award
Amount £38,548 (GBP)
Organisation King's College London 
Sector Academic/University
Country United Kingdom
Start 03/2018 
End 10/2018
 
Description Impact Acceleration Award Rapid
Amount £10,000 (GBP)
Organisation King's College London 
Sector Academic/University
Country United Kingdom
Start 02/2018 
End 06/2018
 
Description Travel grant
Amount £21,059 (GBP)
Funding ID EP/K034626/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Academic/University
Country United Kingdom
Start 03/2013 
End 03/2014
 
Title Sound Spatialisation Software 
Description Software for dynamic spatialisation of sound sources in a dynamically changing environment, e.g. rendition of a VR audio content, that is compatible with multichannel and binaural rendering. 
Type Of Material Improvements to research infrastructure 
Year Produced 2017 
Provided To Others? No  
Impact No impact has been generated yet, but the tool will be useful in psychoacoustics and audiology research where it pertains to spatial hearing, and is expected to be evolve into a commercial software for immersive audio content creation for applications such as AR/VR and professional music mixing. 
 
Description 59 Productions 
Organisation 59 Productions
Country United Kingdom 
Sector Private 
PI Contribution Soundscape design for a multimedia art installation centred around a performance of pianist Yuja Wang.
Collaborator Contribution Design and production of a multimedia art installation centred around a performance of pianist Yuja Wang.
Impact A multimedia art installation centred around a performance of pianist Yuja Wang.
Start Year 2016
 
Description Fidelio Arts 
Organisation Fidelio Arts Ltd
Country United Kingdom 
Sector Private 
PI Contribution Soundscape design for a multimedia art installation centred around a performance of pianist Yuja Wang.
Collaborator Contribution Organisation and management of the project, multimedia art installation centred around a performance of pianist Yuja Wang, including time of the pianist.
Impact A multimedia art installation centred around performance of Yuja Wang, a pianist represented by Fidelio Arts, presently one of leading classical pianists.
Start Year 2016
 
Description Institute of Sound Recording, University of Surrey 
Organisation University of Surrey
Country United Kingdom 
Sector Academic/University 
PI Contribution Expertise, intellectual input.
Collaborator Contribution Expertise, intellectual input.
Impact Joint publications, grant proposal, and further development and deployment of the audio technology developed with the relevant EPSRC project in art projects and installations.
Start Year 2016
 
Description METU 
Organisation Middle East Technical University
Department Institute of Marine Sciences
Country Turkey 
Sector Academic/University 
PI Contribution Expertise, intellectual.
Collaborator Contribution Expertise, intellectual.
Impact Joint publications. Development of the audio technology developed on the relevant EPSRC project and its deployment in art projects and installations.
Start Year 2012
 
Description Sony 
Organisation SONY
Country Japan 
Sector Private 
PI Contribution There has been no contribution yet, but the plan is that we have a research collaboration.
Collaborator Contribution There has been no contribution yet, but the plan is that we have a research collaboration.
Impact No outputs yet.
Start Year 2018
 
Description Stanford 
Organisation Stanford University
Country United States 
Sector Academic/University 
PI Contribution Collaborative research.
Collaborator Contribution Collaborative research.
Impact A joint tutorial on multichannel surround systems, to be presented at ICASSP 2015. A joint paper to be submitted to AT&T Transactions on Audio, Speech, and Language Processing.
Start Year 2013
 
Title Audio Signal Processing Method and System 
Description The patent describes a method for emulating acoustic performance in a given venue using dry studio recordings. 
IP Reference WO2007/060443 
Protection Patent granted
Year Protection Granted 2012
Licensed No
Impact The invention provides a method for synthesis of sound field of desired enclosed spaces.
 
Title Electronic Device with Digital Reverberator and Method 
Description A method for ultra-real time rendition of multichannel reverberation of acoustic spaces. 
IP Reference US2013202125 
Protection Patent granted
Year Protection Granted 2014
Licensed No
Impact No notable impacts yet.
 
Title Microphone Array 
Description The patent describes a class of microphone arrays that are designed so to capture cues needed for convincing perceptual sound field reconstruction. 
IP Reference US12/905,415 
Protection Patent granted
Year Protection Granted 2015
Licensed No
Impact No notable impacts yet.
 
Title SDN iPhone app 
Description The iPhone app aims at delivering the auditory illusion of being in the middle of a virtual rectangular room. This is achieved by means of the scattering delay network (SDN) technology, together with binaural reproduction technique. The app is capable of simulating the acoustics of the room in real time thanks to the extremely low computational complexity of the SDN method, while at the same time delivering important perceptual cues in an accurate manner. The app uses the iPhone gyroscope in order to track the movement of the listener's head and adjusts the simulation accordingly. 
Type Of Technology Webtool/Application 
Year Produced 2015 
Impact The app was sent to several companies to spur their interest in commercial exploitation of the intellectual property arising from relevant EPSRC projects. Dolby has made several visit to King's College and is presently evaluating our technology. 
 
Title Sound Spatialisation Software 
Description Software for dynamic spatialisation of sound sources in a dynamically changing environment, e.g. rendition of a VR audio content, that is compatible with multichannel and binaural rendering. 
Type Of Technology Software 
Year Produced 2017 
Impact No impact has been generated yet, but the software will provide basis for two commercial product prototypes: 1) a VST plugin for professional sound mixing, 2) VR audio plugin for creating audio content in VR environments. 
 
Description 2015 Summer Science Exhibition of the Royal Society. 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact Our scattering delay network (SDN) technology was showcased during the 2015 Summer Science Exhibition, the flagship event of the Royal Society for science communication to the public. The event, lasting a week, had an attendance of about 15,000 people, in addition to two gala nights with the fellows of the Royal Society. The demonstration was part of the stand "Sound Scape Interaction in a 3D World" organised by a consortium of european universities led by Imperial College London. The demonstration consisted of a rotating platform called "Sound Hunter".
Visitors wore headphones while standing on the rotating platform and their task was to rotate the platform until a sound source auralised through the headphones was perceived to be in front of them. The SDN was used in cases where users choose to locate the sound source while in a reverberant room.
Year(s) Of Engagement Activity 2015
URL http://sse.royalsociety.org/2015