S3A: Future Spatial Audio for an Immersive Listener Experience at Home

Lead Research Organisation: University of Surrey
Department Name: Vision Speech and Signal Proc CVSSP

Abstract

3D sound can offer listeners the experience of "being there" at a live event, such as the Proms or Olympic 100m, but
currently requires highly controlled listening spaces and loudspeaker setups. The goal of S3A is to realise practical
3D audio for the general public to enable immersive experiences at home or on the move.

Virtually the whole of the UK population consume audio. S3A aims to unlock the creative potential of 3D sound and deliver to listeners a step change in immersive experiences. This requires a radical new listener centred approach to audio enabling 3D sound production to dynamically adapt to the listeners' environment. Achieving immersive audio experiences in uncontrolled living spaces presents a significant research challenge. This requires major advances in our understanding of the perception of spatial audio together with new representations of audio and the signal processing that allows content creation and perceptually accurate reproduction. Existing audio production formats (stereo, 5.1) and those proposed for future cinema spatial audio (24,128) are channel-based requiring specific controlled loudspeaker arrangements that are simply not practical for the majority of home listeners. S3A will pioneer a novel object-based methodology for audio signal processing that allows flexible production and reproduction in real spaces. The reproduction will be adaptive to loudspeaker configuration, room acoustics and listener locations. The fields of audio and visual 3D scene understanding will be brought together to identify and model audio-visual objects in complex real scenes. Audio-visual objects are sound sources or events with known spatial properties of shape and location over time, e.g. a football being kicked, a musical instrument being played or the crowd chanting at a football match. Object based representation will transform audio production from existing channel based signal mixing (stereo, 5.1, 22.2) to spatial control of isolated sound sources and events. This will realise the creative potential of 3D sound enabling intelligent user-centred content production, transmission and reproduction of 3D audio content in platform independent formats. Object-based audio will allow flexible delivery (broadcast, IP and mobile) and adaptive reproduction of 3D sound to existing and new digital devices.

Planned Impact

Virtually the whole UK population are consumers of audio content. S3A will deliver to listeners a step change in the quality
of perceived sound, and provide new opportunities for UK creative industries to generate wealth through the artistic
exploitation of new audio-visual technology.

S3A's scientific and engineering advances will ensure that UK research remains at the forefront of spatial audio and pioneers new integrated audio-visual signal processing methodologies. This research will enable UK creative industries (broadcast, film, games, interactive media) to develop and exploit the best future spatial audio production and delivery technologies that add value to listeners' experience. The UK is a world-leader in audio-visual content production which is a growth sector contributing 12% (£120B) to the economy with over 2.5M employees. Consequently the UK is extremely well placed to exploit S3A research through creative/technology SME's(KEF,DTS,Orbitsound), TV (BBC), film (DNeg, Framestore, MPC), games (EA, Sony, Codemasters). S3A will enable UK creative industries to lead future technologies and standards for spatial audio and object-based audio-visual production.

Pathways to impact include:
(1) collaboration with the BBC to realise S3A technology in the next generation of spatial audio for broadcast and IP networks;
(2) working with games/film/web companies to address their requirements for spatial audio production & reproduction;
(3) leading international open standards for spatial audio through the BBC who are actively engaged in ISO/MPEG standards for audio and visual content;
(4) licensing of S3A technology to UK SME's for integration in mobile and home platforms;
(5) engagement of representative bodies to ensure needs of the hearing impaired are addressed;
(6) collaboration with creatives on showcasing the potential of spatial audio to deliver new listener experience;
(7) engaging the public in S3A research through spatial audio test broadcasts and web-based interactive media with the BBC
(8) public engagement with the science behind S3A spatial audio through pilots for feature documentaries on BBC TV/radio with involvement of co-investigator Prof.Trevor Cox who is a regular science commentator;
(9) open source tools for amateurs & professionals to edit/stream 3D sound to support active engagement in creative use of spatial audio;
(10) workshops to foster an audio-visual research community bringing together audio and visual researchers from both academia and industry.

Publications

10 25 50

publication icon
Dazzi E (2018) Scalable object instance recognition based on keygraph matching in Pattern Recognition Letters

publication icon
Chen J (2021) Channel and spatial attention based deep object co-segmentation in Knowledge-Based Systems

publication icon
Baykaner K (2015) The Relationship Between Target Quality and Interference in Sound Zone in Journal of the Audio Engineering Society

publication icon
Menzies D (2019) Multichannel Compensated Amplitude Panning, An Adaptive Object-Based Reproduction Method in Journal of the Audio Engineering Society

publication icon
Shirley B (2017) Personalized Object-Based Audio for Hearing Impaired TV Viewers in Journal of the Audio Engineering Society

publication icon
Francombe J (2017) Evaluation of Spatial Audio Reproduction Methods (Part 2): Analysis of Listener Preference in Journal of the Audio Engineering Society

publication icon
Galindo M (2020) Microphone Array Geometries for Horizontal Spatial Audio Object Capture With Beamforming in Journal of the Audio Engineering Society

publication icon
Gálvez M (2019) Dynamic Audio Reproduction with Linear Loudspeaker Arrays in Journal of the Audio Engineering Society

 
Title Effect of Background Music Arrangement and Tempo on Foreground Speech Intelligibility: wav audio files - background music 
Description A zip folder with sub-folders containing .wav files of background music and speech-shaped noise (SSN) (control masking noise). As used with Tang & Cooke's (2016) HEGP OIM (high energetic glimpse proportion objective intelligibility metrics) and in a quantitative, subjective speech-in-noise test to investigate whether or not background music arrangement (in terms of timbre and music arrangement) and tempo have a significant effect on speech intelligibility. The investigation was conducted at the University of Salford in 2018 towards the PhD thesis by P. Demonte (2022). The speech-in-noise test used the original dialogue recording of the Revised Speech Perception In Noise test (RSPIN) spoken sentences (Kalikow, Stevens and Elliott, 1977; Bilger, 1984; Bilger et al., 1984) available on CD-r. Headphone playback of the dialogue was calibrated to an average level of 63 dB A. The master background music audio files were generated in Garage Band using Apple Loops. The control background noise - speech-shaped noise - a purely energetic masker used for comparison against music, was produced using white noise and samples of the spoken dialogue. The background music and speech-shaped noise audio files in this zip folder were set relative to the dialogue playback level to produce a glimpse proportion value for the dialogue of 10 (GP10), as per the output of Tang & Cooke's (2016) HEGP OIM within a Matlab script using an interative 'for' loop. That is to say, all the background masking noises were set to different speech-to-noise ratios, but to produce the same energetic masking level, such that any significant differences with regards to effect on speech intelligibility would be attributable to other factors. Playback of the dialogue and masking noise audio files was via an Adobe Audition digital audio work station. For an overview of the speech-to-noise ratios and glimpse proportions of each speech-noise .wav file pairing, see the Excel spreadsheet: https://doi.org/10.17866/rd.salford.19753936 - Effect of Background Music Arrangement and Tempo on Foreground Speech Intelligibility: Listening experiment settings (SNRs, GP, HEGP) spreadsheets. KEY Music - created in Garage Band using Apple Loops M1 (Apple Loop: Fireplace All): string quartet playing in a legato style; M2 (Apple Loop: Countdown Cello 01): solo cello playing a single note in a staccato, bowed style; M3 (Apple Loops: Countdown Cello 01; Laid Back Classic 01; African King Gyl 04; Big Maracas 03): cello, electric guitar, and lightly-percussive instrumentation; M4 (Apple Loops: Countdown Cello 01; Laid Back Classic 01; African King Gyl 04; Big Maracas 03; Lake Shift Bass; Barricade Arpeggio; High Octane Arpeggio; Altered State Beat 02): cello, electric guitar, and more heavily percussive instrumentation; M5_T0: speech-shaped noise (SSN); a purely energetic masking noise used as a control condition to compare any effects of the background music against. No defined tempo. Tempo T1: 60 beats per minute (BPM); T2: 100 bpm; T3: 140 bpm. GP10 refers to the arbitrary glimpse proportion (=10) of the spoken sentences relative to the background music or speech-shaped noise level. The audio file names in this zip folder also reflect: * RSPIN list number; * RSPIN sentence number; * the semantic level of the RSPIN sentence that corresponds to each masking noise file (HP = high predictability; LP = low predictability); * the target word of the RSPIN sentence that corresponds to each masking noise. -------------------------------------------------------------------------- For further details, contact: email (1): p.demonte@edu.salford.ac.uk email (2): philippademonte@gmail.com 
Type Of Art Film/Video/Animation 
Year Produced 2022 
URL https://salford.figshare.com/articles/media/Effect_of_Background_Music_Arrangement_and_Tempo_on_Fore...
 
Title Effect of Background Music Arrangement and Tempo on Foreground Speech Intelligibility: wav audio files - background music 
Description A zip folder with sub-folders containing .wav files of background music and speech-shaped noise (SSN) (control masking noise). As used with Tang & Cooke's (2016) HEGP OIM (high energetic glimpse proportion objective intelligibility metrics) and in a quantitative, subjective speech-in-noise test to investigate whether or not background music arrangement (in terms of timbre and music arrangement) and tempo have a significant effect on speech intelligibility. The investigation was conducted at the University of Salford in 2018 towards the PhD thesis by P. Demonte (2022). The speech-in-noise test used the original dialogue recording of the Revised Speech Perception In Noise test (RSPIN) spoken sentences (Kalikow, Stevens and Elliott, 1977; Bilger, 1984; Bilger et al., 1984) available on CD-r. Headphone playback of the dialogue was calibrated to an average level of 63 dB A. The master background music audio files were generated in Garage Band using Apple Loops. The control background noise - speech-shaped noise - a purely energetic masker used for comparison against music, was produced using white noise and samples of the spoken dialogue. The background music and speech-shaped noise audio files in this zip folder were set relative to the dialogue playback level to produce a glimpse proportion value for the dialogue of 10 (GP10), as per the output of Tang & Cooke's (2016) HEGP OIM within a Matlab script using an interative 'for' loop. That is to say, all the background masking noises were set to different speech-to-noise ratios, but to produce the same energetic masking level, such that any significant differences with regards to effect on speech intelligibility would be attributable to other factors. Playback of the dialogue and masking noise audio files was via an Adobe Audition digital audio work station. For an overview of the speech-to-noise ratios and glimpse proportions of each speech-noise .wav file pairing, see the Excel spreadsheet: https://doi.org/10.17866/rd.salford.19753936 - Effect of Background Music Arrangement and Tempo on Foreground Speech Intelligibility: Listening experiment settings (SNRs, GP, HEGP) spreadsheets. KEY Music - created in Garage Band using Apple Loops M1 (Apple Loop: Fireplace All): string quartet playing in a legato style; M2 (Apple Loop: Countdown Cello 01): solo cello playing a single note in a staccato, bowed style; M3 (Apple Loops: Countdown Cello 01; Laid Back Classic 01; African King Gyl 04; Big Maracas 03): cello, electric guitar, and lightly-percussive instrumentation; M4 (Apple Loops: Countdown Cello 01; Laid Back Classic 01; African King Gyl 04; Big Maracas 03; Lake Shift Bass; Barricade Arpeggio; High Octane Arpeggio; Altered State Beat 02): cello, electric guitar, and more heavily percussive instrumentation; M5_T0: speech-shaped noise (SSN); a purely energetic masking noise used as a control condition to compare any effects of the background music against. No defined tempo. Tempo T1: 60 beats per minute (BPM); T2: 100 bpm; T3: 140 bpm. GP10 refers to the arbitrary glimpse proportion (=10) of the spoken sentences relative to the background music or speech-shaped noise level. The audio file names in this zip folder also reflect: * RSPIN list number; * RSPIN sentence number; * the semantic level of the RSPIN sentence that corresponds to each masking noise file (HP = high predictability; LP = low predictability); * the target word of the RSPIN sentence that corresponds to each masking noise. -------------------------------------------------------------------------- For further details, contact: email (1): p.demonte@edu.salford.ac.uk email (2): philippademonte@gmail.com 
Type Of Art Film/Video/Animation 
Year Produced 2022 
URL https://salford.figshare.com/articles/media/Effect_of_Background_Music_Arrangement_and_Tempo_on_Fore...
 
Title Precedence Effect: Listening experiment images 
Description A zip file containing three .png image files relating to a subjective speech-in-noise listening experiment conducted in the listening room at the University of Salford in March 2020. This experiment towards the PhD thesis by P. Demonte (2022) investigated whether or not the precedence effect could be utilised to significantly improve speech intelligibility for augmented loudspeaker arrays in the home, with future applications to media device orchestration with object-based audio. The experiment further explored binaural unmasking in terms of: i) binaural masking level difference (BMLD) and ii) the so-called better ear effect (BEE). The listening experiment involved three different loudspeaker array configurations: * L1 + R1 - a regular stereo configuration of two loudspeakers, with both simultaneously reproducing spoken dialogue and background noise; * L1 + R1 + C2 - a three-loudspeaker array, with an auxiliary loudspeaker (C2) in the true centre position (0 degrees azimuth) between the L1 + R1 stereo pair. C2 just plays spoken dialogue with a 10ms delay to invoke the precedence effect and provide a boost to the speech signal. Equalisation is also applied to the C2 signal to negate differences in comb filtering effects between the two- and three-loudspeaker array configurations; * L1 + R1 + R2 - a three-loudspeaker array, with the auxiliary loudspeaker (R2) at +90 degrees azimuth in order to test the better ear effect. As with L1 + R1 + C2, R2 just plays spoken dialogue with a 10ms delay, and equalisation is applied. The images in this zip file show: * MDO_Array_Listener.png - a photo showing the configuration of the four loudspeakers (for three different loudspeaker array configurations) and the seated listener position in the listening room for the experiment; * MDO_configuration.png - a figure showing the loudspeaker array positions (distances and azimuths from the listener position); * MDO_schematic.png - showing the differences between the two- and three-loudspeaker arrays in terms of boosts, delays, and equalisation applied. ------------------------------------------------------------------- For further information, contact: email (1): p.demonte@edu.salford.ac.uk email (2): philippademonte@gmail.com 
Type Of Art Image 
Year Produced 2022 
URL https://salford.figshare.com/articles/figure/Precedence_Effect_Listening_experiment_images/19766881/...
 
Title Precedence Effect: Listening experiment images 
Description A zip file containing three .png image files relating to a subjective speech-in-noise listening experiment conducted in the listening room at the University of Salford in March 2020. This experiment towards the PhD thesis by P. Demonte (2022) investigated whether or not the precedence effect could be utilised to significantly improve speech intelligibility for augmented loudspeaker arrays in the home, with future applications to media device orchestration with object-based audio. The experiment further explored binaural unmasking in terms of: i) binaural masking level difference (BMLD) and ii) the so-called better ear effect (BEE). The listening experiment involved three different loudspeaker array configurations: * L1 + R1 - a regular stereo configuration of two loudspeakers, with both simultaneously reproducing spoken dialogue and background noise; * L1 + R1 + C2 - a three-loudspeaker array, with an auxiliary loudspeaker (C2) in the true centre position (0 degrees azimuth) between the L1 + R1 stereo pair. C2 just plays spoken dialogue with a 10ms delay to invoke the precedence effect and provide a boost to the speech signal. Equalisation is also applied to the C2 signal to negate differences in comb filtering effects between the two- and three-loudspeaker array configurations; * L1 + R1 + R2 - a three-loudspeaker array, with the auxiliary loudspeaker (R2) at +90 degrees azimuth in order to test the better ear effect. As with L1 + R1 + C2, R2 just plays spoken dialogue with a 10ms delay, and equalisation is applied. The images in this zip file show: * MDO_Array_Listener.png - a photo showing the configuration of the four loudspeakers (for three different loudspeaker array configurations) and the seated listener position in the listening room for the experiment; * MDO_configuration.png - a figure showing the loudspeaker array positions (distances and azimuths from the listener position); * MDO_schematic.png - showing the differences between the two- and three-loudspeaker arrays in terms of boosts, delays, and equalisation applied. ------------------------------------------------------------------- For further information, contact: email (1): p.demonte@edu.salford.ac.uk email (2): philippademonte@gmail.com 
Type Of Art Image 
Year Produced 2022 
URL https://salford.figshare.com/articles/figure/Precedence_Effect_Listening_experiment_images/19766881
 
Title The Turning Forest 
Description S3A research produced the cutting edge object based audio radio drama which was then converted into the first immersive experience for the BBC - the VR The Turning Forest. This is a sound-based real-time CGI VR fairytale for people young and old--inviting audiences into a magical space of imagination, where rustling leaves of an autumn forest are also the footsteps of something familiar, yet strange. 
Type Of Art Artefact (including digital) 
Year Produced 2017 
Impact The work premiered in April at the Tribeca Film Festival Storyscapes Exhibition, which focuses on cutting edge artworks that explore new uses of media, highlighting innovation.It went to win the TVB european awards for best sound and was a finalist for BEST VR google daydream experience in May 2017 http://www.tvbeurope.com/tvbawards-2016-winners-announced/ 
URL http://www.s3a-spatialaudio.org/wordpress/
 
Title The Vostok-K Incident 
Description The Vostok-K Incident, an S3A specially created science-fiction story, was designed to specifically take advantage of additional connected devices available to users. The S3A researchers used a technology called "object-based media" to flexibly reproduce audio, regardless of what devices people connect to or how these are arranged. In fact, the more devices that are connected, the more immersion the listener experiences by unlocking surround sound effects as well as extra hidden content. The Vostok-K Incident. It's 13 minutes long and was created specifically to take advantage of extra connected devices to tell a story. 
Type Of Art Artefact (including digital) 
Year Produced 2018 
Impact The Vostok-K Incident was launched at the British Science Festival in November 2018. This idea with a new science-fiction drama is part of the BBC taster online. 
URL http://www.s3a-spatialaudio.org/vostok-k
 
Description S3A is pioneering methods for creating immersive spatial audio experiences for the listener at home or on the move. Research is investigating all aspects of the production from recording and editing through to delivery and practical reproduction at home.
S3A has delivered advances in the following areas:
- understanding and modelling listener perception of spatial audio in real spaces
- perceptual metering of spatial audio
- end-to-end production of spatial audio from recording to reproduction
- object-based audio recording, editing and manipulation
- object-based spatial audio reproduction
- listenter centered reproduction
- room modelling and adaption in spatial audio reproduction
- audio-visual localisation of sound sources
- source separation for multiple object sources
- perceptual modelling of intelligibility of sound sources
- methods to control and improve intelligibility of content
- audio-visual room modelling
- creation of new spatial audio experiences for listeners at home
- personalised listening experiences to improve accessibility of content based on narrative importance and listener perception of the content
- use of trans aural speaker arrays to create independent listening experience for multiple listeners in the same environment

These technologies have been integrated to demonstrate enhanced and new listening experiences. Technologies developed in S3A are contributing to international standards and new consumer technologies.

S3A has exceeded the original objectives by introducing and spinning out new technologies for personalised immersive audio experiences including a commercial sound-bar technology which creates the experience of virtual headphones, a new generation of audio experience using media device orchestration, VR spatial audio experience using room adaptation, and the first open source tools for object-based spatial audio production.
Exploitation Route S3A has contributed new technologies for creation of immersive audio and audio-visual content which can be experienced by the listener at home. Further exploitation is expected through:

- commercial exploitation novel methods for production of audio and audio-visual content in the creative industries (TV, film, games, internet)
- novel methods for listeners to experience spatial audio a home (consumer electronics, TV, film, games)
- novel devices for audio and visual content
- technologies for perceptual metering of audio in production and reproduction
- new creative tools and media experiences
- the first open-source tools for object based spatial audio production
- media device orchestration enabling immersive experiences without specialist spatial audio production technology with public demonstration eg. Vostok-K
- commercialisation of sound bar technology by spinout AudioScenic to create the experience of virtual headphones
- personalised immersive spatial audio experiences
- award winning content creation of immersive spatial audio experience eg The Turning Forest available on GooglePlay/Occulus VR, listed as a top 20 VR experience 2016-20
- award winning broadcast personalised TV content to improve accessibility by exploiting narrative importance eg. BBC Casualty
Sectors Communities and Social Services/Policy,Creative Economy,Digital/Communication/Information Technologies (including Software),Education,Healthcare,Leisure Activities, including Sports, Recreation and Tourism,Culture, Heritage, Museums and Collections

URL http://www.s3a-spatialaudio.org
 
Description Listener centred spatial audio reproduction for immersive spatial audio experience at home and improve content accessibility for the hearing impaired. IP protection, licensing and commercialisation of sound bar technology for personalised spatial audio reproduction of 'virtual headphones' for multiple listeners with personalised content for each listener. This technology has been commercialised by spinout AudioScenic. The first open-source tools for object-based spatial audio production enabling use in the creative industries. Content creation to demonstrate the potential of object-based spatial audio production of personalised immersive experiences. Award winning experiences include The Turning Forest VR, Vostok-K, Casualty and other content released via the BBC Taster and public platforms. Media device orchestration demonstrating the capability of practical immersive experience production across an ad-hoc array of devices demonstrated in the Vostok-K and other MDO experiences. This has resulted in follow on commercial development to explore the creative potential of MDO. S3A established the foundations for an EPSRC/BBC Prosperity Partnership 'AI4ME - Personalised Object-based Media Experiences for All', a 5 year collaboration led by the BBC and University of Surrey in collaboration with University of Lancaster and 15 leading companies across the UK media industry. AI4ME builds directly on the pioneering research in object-based delivered by S3A and will deliver a new generation of media experiences which are personalised to individual interest, accessibility requirements, device and location.
First Year Of Impact 2013
Sector Creative Economy,Digital/Communication/Information Technologies (including Software),Healthcare,Culture, Heritage, Museums and Collections
Impact Types Cultural,Societal,Economic

 
Description Ofcom Object-based Media Working Group
Geographic Reach National 
Policy Influence Type Participation in a guidance/advisory committee
Impact Influence on media communication and service regulation
URL https://www.ofcom.org.uk
 
Description Audio-Visual Media Research Platform
Amount ÂŁ1,577,223 (GBP)
Funding ID EP/P022529/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 08/2017 
End 07/2022
 
Description BBC Prosperity Partnership: Future Personalised Object-based Media Experiences Delivered at Scale Anywhere
Amount ÂŁ8,500,000 (GBP)
Funding ID EP/V038087/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 07/2021 
End 06/2026
 
Description EPSRC I-case studentship
Amount ÂŁ107,560 (GBP)
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 10/2015 
End 09/2020
 
Description EPSRC i-case studentship
Amount ÂŁ108,580 (GBP)
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 10/2016 
End 09/2021
 
Description ICURE - Support for Junior Researcher, Dr J Francombe for 3 months, able to claim up to £35,000 of travel and expenditure, to carry out market validation of research-based business ideas and to receive intensive support in developing those.
Amount ÂŁ35,000 (GBP)
Organisation SETsquared Partnership 
Sector Charity/Non Profit
Country United Kingdom
Start 04/2016 
End 07/2016
 
Description Polymersive: Immersive Video Production Tools for Studio and Live Events
Amount ÂŁ726,251 (GBP)
Funding ID 105168 
Organisation Innovate UK 
Sector Public
Country United Kingdom
Start 03/2019 
End 09/2020
 
Title BST - Binaural Synthesis Toolkit 
Description The Binaural Synthesis Toolkit is a modular and open-source package for binaural synthesis, i.e., spatial audio reproduction over headphones or transaural loudspeaker systems. It supports different reproduction methods (dynamic HRIR synthesis, HOA-based rendering, and BRIR-based virtual loudspeaker rendering), dynamic head tracking, and current data formats as SOFA. It is based on the VISR framework and implemented in Python, which means that the code is relatively accessible and open to adaptations and extensions, as well as being reusable in larger audio processing algorithms. The BST is provided to foster reproducible audio research. It is mainly targeted at researchers in sound reproduction and perception, but it could be used by enthusiasts as well. 
Type Of Material Computer model/algorithm 
Year Produced 2018 
Provided To Others? Yes  
Impact * AES convention paper: Andreas Franck, Giacomo Costantini, Chris Pike, and Filippo Maria Fazi, "An Open Realtime Binaural Synthesis Toolkit for Audio Research," in Proc. Audio Eng. Soc. 144th Conv., Milano, Italy, 2018, Engineering Brief. 
URL http://cvssp.org/data/s3a/public/
 
Title Compensated Stereo Panning Perceptual Test Data 
Description A Low Frequency Panning Method with Compensation for Head Rotation. IEEE/ACM Transactions on Audio, Speech, and Language Processing. 
Type Of Material Database/Collection of data 
Year Produced 2017 
Provided To Others? Yes  
Impact Unprocessed results of listening tests for Compensated Amplitude Panning Reproduction of spatial audio. Details in 'A Low Frequency Panning Method with Compensation for Head Rotation' 
URL https://eprints.soton.ac.uk/415757/
 
Title Effect of Background Music Arrangement and Tempo on Foreground Speech Intelligibility: Listening experiment settings (SNRs, GP, HEGP) spreadsheets. 
Description Excel spreadsheeting containing data collected and collated from objective and subjective testing of whether or not background music arrangement (timbre and instrumentation density) and tempo have any significant effect on foreground speech intelligibility. The values of the objective data - speech-to-noise ratios (dB SNR), glimpse proportions (GP), and high energy glimpse proportions (HEGP) - were generated and collected in a Matlab script that incorporated Tang & Cooke's (2016) HEGP OIM (high energy glimpse proportion objective intelligibility metric) together with an interative 'for' loop. The subjective data were collected in a standard speech-in-noise test (SINT), in which participants listened via headphones to speech played simultaneously with either background music or a control masking noise, and were tasked with identifying the final word of each spoken sentence (target word). The listening experiment used the RSPIN speech corpus. Background music stimuli were generated by the researcher using Apple Loops in Garage Band. 'Read Me' page provides: a brief overview of the listening experiment; citation and link for Tang and Cooke's (2016) HEGP OIM; key to explain the shorthand of the independent variables and file names, and an overview of the other spreadsheets. 'Various_GP' is an overview of equivalent speech-to-noise ratios (dB SNR) determined for three different glimpse proportion (GP) values using the speech and music masker / masking noise pairs in the Matlab script. These objective values were generated to determine which target glimpse proportion to set all the masking noise files to for the subjective listening experiment. 'GP10_SNRs' shows two tables: one with the GP values that each masking noise file was set to and the corresponding SNRs; the other table shows this information summarised across 300 speech-noise audio file pairs. 'Results' shows the raw subjective listening experiment data collected, collated, and sorted by participant ID number, RSPIN list and RSPIN sentence number. This table has pulled in the relevant speech-to-noise ratio, glimpse proportion, and high energy glimpse proportion value from the previous page. 'Summaries' shows tables of the data collated in different ways for the purpose of generating box and whisker plots and conducting statistical analyses. Each table is a summary by participant ID (rows) and the speech-background music / masking noise combination of independent variables: total number of trials; summed correct word scores; mean correct word recognition percentages; mean speech-to-noise ratios (dB SNR); mean glimpse proportions (GP), and mean high energy glimpse proportions (HEGP) ------------------------------------------------------------------- For further details, see PhD thesis by P. Demonte (2022), or contact: email (1): p.demonte@edu.salford.ac.uk email (2): philippademonte@gmail.com See also the Excel spreadsheet with the listening experiment data and statistical analyses: https://doi.org/10.17866/rd.salford.19745815 'Effect of Background Music Arrangement and Tempo on Foreground Speech Intelligibiltiy: Listening experiment data'. 
Type Of Material Database/Collection of data 
Year Produced 2022 
Provided To Others? Yes  
URL https://salford.figshare.com/articles/dataset/Effect_of_Background_Music_Arrangement_and_Tempo_on_Fo...
 
Title Effect of Background Music Arrangement and Tempo on Foreground Speech Intelligibility: Listening experiment settings (SNRs, GP, HEGP) spreadsheets. 
Description Excel spreadsheeting containing data collected and collated from objective and subjective testing of whether or not background music arrangement (timbre and instrumentation density) and tempo have any significant effect on foreground speech intelligibility. The values of the objective data - speech-to-noise ratios (dB SNR), glimpse proportions (GP), and high energy glimpse proportions (HEGP) - were generated and collected in a Matlab script that incorporated Tang & Cooke's (2016) HEGP OIM (high energy glimpse proportion objective intelligibility metric) together with an interative 'for' loop. The subjective data were collected in a standard speech-in-noise test (SINT), in which participants listened via headphones to speech played simultaneously with either background music or a control masking noise, and were tasked with identifying the final word of each spoken sentence (target word). The listening experiment used the RSPIN speech corpus. Background music stimuli were generated by the researcher using Apple Loops in Garage Band. 'Read Me' page provides: a brief overview of the listening experiment; citation and link for Tang and Cooke's (2016) HEGP OIM; key to explain the shorthand of the independent variables and file names, and an overview of the other spreadsheets. 'Various_GP' is an overview of equivalent speech-to-noise ratios (dB SNR) determined for three different glimpse proportion (GP) values using the speech and music masker / masking noise pairs in the Matlab script. These objective values were generated to determine which target glimpse proportion to set all the masking noise files to for the subjective listening experiment. 'GP10_SNRs' shows two tables: one with the GP values that each masking noise file was set to and the corresponding SNRs; the other table shows this information summarised across 300 speech-noise audio file pairs. 'Results' shows the raw subjective listening experiment data collected, collated, and sorted by participant ID number, RSPIN list and RSPIN sentence number. This table has pulled in the relevant speech-to-noise ratio, glimpse proportion, and high energy glimpse proportion value from the previous page. 'Summaries' shows tables of the data collated in different ways for the purpose of generating box and whisker plots and conducting statistical analyses. Each table is a summary by participant ID (rows) and the speech-background music / masking noise combination of independent variables: total number of trials; summed correct word scores; mean correct word recognition percentages; mean speech-to-noise ratios (dB SNR); mean glimpse proportions (GP), and mean high energy glimpse proportions (HEGP) ------------------------------------------------------------------- For further details, see PhD thesis by P. Demonte (2022), or contact: email (1): p.demonte@edu.salford.ac.uk email (2): philippademonte@gmail.com See also the Excel spreadsheet with the listening experiment data and statistical analyses: https://doi.org/10.17866/rd.salford.19745815 'Effect of Background Music Arrangement and Tempo on Foreground Speech Intelligibiltiy: Listening experiment data'. 
Type Of Material Database/Collection of data 
Year Produced 2022 
Provided To Others? Yes  
URL https://salford.figshare.com/articles/dataset/Effect_of_Background_Music_Arrangement_and_Tempo_on_Fo...
 
Title Precedence Effect: Listening experiment data and statistical analyses spreadsheets 
Description Excel spreadsheet with the data collected from a subjective, quantitative speech in noise test (SINT) conducted in the Listening Room at the University of Salford in March 2020. The listening experiment tested how the psychoacoustic phenomenon of the precedence effect can be utilised with augmented loudspeaker arrays in an object-based audio paradigm to improve speech intelligibility in the home environment. A practical application of this research will be in the implementation of media device orchestration, i.e. the creation of low-cost, ad-hoc loud speaker arrays using commonly found devices, such as mobile phones, laptop computers, tablets, smart speakers, and so on, to spatialise audio in the home. This speech-in-noise test was conducted under controlled conditions. With audio reproduced by one of three different arrays of loudspeakers in a given trial, subjects listened to spoken sentences played simultaneously with noise. They were tasked with correctly identifying target words. Correct word scores collated and converted to word recognition percentages act as a quantifiable proxy for speech intelligibility. After confirming that they fulfilled the criterion for use, data were statistically analysed using 2-way RMANOVA. The three configurations of loudspeaker arrays were: * L1R1_base (a two-loudspeaker control condition): a stereo pair of front left and front right loudspeakers at -/+30 degrees azimuth and 2m distance from the listener position; speech + noise reproduced by both loudspeakers. * L1R1C2 (three loudspeakers): L1R1_base + an additional (AUX) loudspeaker in the true front centre position (0 degrees azimuth and 1.7m distance from listener position) reproducing just speech. * L1R1R2 (three loudspeakers): L1R1_base + an AUX loudspeaker in the right-hand position (+90 degrees azimuth and 1.7m distance from listener position) reproducing just speech. For the array configurations with the three loudspeakers, the precedence effect was initiated by applying a 10 ms delay to the speech signal reproduced by the AUX loudspeaker, such that the sound source (first arrivals) would still be perceived as being from the phantom centre between the L1 and R1 loudspeakers, but with a boost to the speech signal. The relevant equalisation (EQ) was applied to the speech signal for the C2 and R2 AUX loudspeakers though to maintain the same perceived comb filtering effects for all three loudspeaker array configurations. Analysis of the results is provided in the PhD thesis by P. Demonte. ----------------------------------------------------------------------- Spreadsheet pages: * Read Me - provides a more in-depth explanation of the independent variables tested * Raw data - as collected in the speech-in-noise test. The columns denote: subject number; trial number; audio files playing from each loudspeaker in a trial; loudspeaker array configuration; masking noise type; Harvard speech corpus list and sentence number; spoken sentence played; the five target words in each sentence; the sentence as heard and noted by the subject; correct word score applied (out of a total of 5 per trial); correct word ratio. * CWR_all - correct word percentages collated for each subject for each combination of independent variables, and the corresponding studentized residuals as a quality check for outliers. * NormalDistTest - criteria for normal distribution (Shapiro-Wilk test) * 2-way RMANOVA_16subjects - Mauchley's test of Sphericity, and Tests of Wtihin-Subjects Effects (2-way RMANOVA) * SimpleMainEffects - analysis of the conditional effects * Participants_MainTest - anonymised data collated from the subjects via a short pre-screening questionnaire: age; gender, handedness (left or right); confirmation of subjects as native English speakers, and whether or not they are bi-/multilingual in case of outliers. 
Type Of Material Database/Collection of data 
Year Produced 2022 
Provided To Others? Yes  
URL https://salford.figshare.com/articles/dataset/Precedence_Effect_Listening_experiment_data_and_statis...
 
Title Precedence Effect: Listening experiment data and statistical analyses spreadsheets 
Description Excel spreadsheet with the data collected from a subjective, quantitative speech in noise test (SINT) conducted in the Listening Room at the University of Salford in March 2020. The listening experiment tested how the psychoacoustic phenomenon of the precedence effect can be utilised with augmented loudspeaker arrays in an object-based audio paradigm to improve speech intelligibility in the home environment. A practical application of this research will be in the implementation of media device orchestration, i.e. the creation of low-cost, ad-hoc loud speaker arrays using commonly found devices, such as mobile phones, laptop computers, tablets, smart speakers, and so on, to spatialise audio in the home. This speech-in-noise test was conducted under controlled conditions. With audio reproduced by one of three different arrays of loudspeakers in a given trial, subjects listened to spoken sentences played simultaneously with noise. They were tasked with correctly identifying target words. Correct word scores collated and converted to word recognition percentages act as a quantifiable proxy for speech intelligibility. After confirming that they fulfilled the criterion for use, data were statistically analysed using 2-way RMANOVA. The three configurations of loudspeaker arrays were: * L1R1_base (a two-loudspeaker control condition): a stereo pair of front left and front right loudspeakers at -/+30 degrees azimuth and 2m distance from the listener position; speech + noise reproduced by both loudspeakers. * L1R1C2 (three loudspeakers): L1R1_base + an additional (AUX) loudspeaker in the true front centre position (0 degrees azimuth and 1.7m distance from listener position) reproducing just speech. * L1R1R2 (three loudspeakers): L1R1_base + an AUX loudspeaker in the right-hand position (+90 degrees azimuth and 1.7m distance from listener position) reproducing just speech. For the array configurations with the three loudspeakers, the precedence effect was initiated by applying a 10 ms delay to the speech signal reproduced by the AUX loudspeaker, such that the sound source (first arrivals) would still be perceived as being from the phantom centre between the L1 and R1 loudspeakers, but with a boost to the speech signal. The relevant equalisation (EQ) was applied to the speech signal for the C2 and R2 AUX loudspeakers though to maintain the same perceived comb filtering effects for all three loudspeaker array configurations. Analysis of the results is provided in the PhD thesis by P. Demonte. ----------------------------------------------------------------------- Spreadsheet pages: * Read Me - provides a more in-depth explanation of the independent variables tested * Raw data - as collected in the speech-in-noise test. The columns denote: subject number; trial number; audio files playing from each loudspeaker in a trial; loudspeaker array configuration; masking noise type; Harvard speech corpus list and sentence number; spoken sentence played; the five target words in each sentence; the sentence as heard and noted by the subject; correct word score applied (out of a total of 5 per trial); correct word ratio. * CWR_all - correct word percentages collated for each subject for each combination of independent variables, and the corresponding studentized residuals as a quality check for outliers. * NormalDistTest - criteria for normal distribution (Shapiro-Wilk test) * 2-way RMANOVA_16subjects - Mauchley's test of Sphericity, and Tests of Wtihin-Subjects Effects (2-way RMANOVA) * SimpleMainEffects - analysis of the conditional effects * Participants_MainTest - anonymised data collated from the subjects via a short pre-screening questionnaire: age; gender, handedness (left or right); confirmation of subjects as native English speakers, and whether or not they are bi-/multilingual in case of outliers. 
Type Of Material Database/Collection of data 
Year Produced 2022 
Provided To Others? Yes  
URL https://salford.figshare.com/articles/dataset/Precedence_Effect_Listening_experiment_data_and_statis...
 
Title Room Impulse Responses (RIRs) and Visualisation 
Description RIR datasets captured as part of the S3A project and supplementary material. 
Type Of Material Database/Collection of data 
Year Produced 2016 
Provided To Others? Yes  
Impact Permission is granted to use the S3A Room Impulse Response dataset for academic purposes only, provided that it is suitably referenced in publications related to its use 
URL http://cvssp.org/data/s3a/
 
Title S3A radio drama scenes 
Description Data created for the S3A Radio Drama 
Type Of Material Database/Collection of data 
Year Produced 2016 
Provided To Others? Yes  
Impact The Radio Drama was used for the creation of the VR The Turning Forest - award winning BBC's first VR production 
URL http://cvssp.org/data/s3a/
 
Title S3A speaker tracking with Kinect2 
Description Person tracking using audio and depth cues Identity association using PHD filters in multiple head tracking with depth sensors 
Type Of Material Database/Collection of data 
Year Produced 2018 
Provided To Others? Yes  
Impact Datasets - Open access 
URL https://www.s3a-spatialaudio.org/datasets
 
Title S3A speaker tracking with Kinect2 
Description Person tracking using audio and depth cues Identity association using PHD filters in multiple head tracking with depth sensors 
Type Of Material Database/Collection of data 
Year Produced 2016 
Provided To Others? Yes  
Impact Permission is granted to use the S3A Room Impulse Response dataset for academic purposes only, provided that it is suitably referenced in publications related to its use 
URL http://cvssp.org/data/s3a/
 
Title Speech-To-Screen: Listening Experiment Data and Statistical Analyses spreadsheets 
Description An Excel spreadsheet related to the Speech-To-Screen listening experiment conducted in the Listening Room at the University of Salford in 2017 as part of the EPSRC-funded S3A Future Spatial Audio at Home project. The aim of the experiment was to test the effect on speech intelligibility of different binaural auralisations of speech and noise related to headphone playback with small-screen devices. The experiment involved a speech-in-noise test, whereby subjects had to identify target words, in this case - letter-number pairs - in spoken sentences played simultaneously with either speech-shaped noise (SSN) or speech-modulated noise (SMN). Colated correct word scores converted to word recognition percentages then acted as a proxy for quantifying speech intelligibility for the different conditions tested. Spreadsheet includes: * Read Me page - including an overview of the independent variables; * Raw data collected: letter-number combinations entered by subjects into a graphical user interface for each trial; * Correct word scores for each trial; * Scores summed by subject and combination of conditions, then converted to ratios and percentages; * Criterion checks for use of 3-way RMANOVA; * Statistical analyses using 3-way RMANOVA and post-hoc pairwise comparisons; * Further analyses, including quantified intelligibility of the 16 different speakers' content from the GRID speech corpus used. 
Type Of Material Database/Collection of data 
Year Produced 2022 
Provided To Others? Yes  
URL https://salford.figshare.com/articles/dataset/Speech-To-Screen_Listening_Experiment_Data_and_Statist...
 
Title Speech-To-Screen: Listening Experiment Data and Statistical Analyses spreadsheets 
Description An Excel spreadsheet related to the Speech-To-Screen listening experiment conducted in the Listening Room at the University of Salford in 2017 as part of the EPSRC-funded S3A Future Spatial Audio at Home project. The aim of the experiment was to test the effect on speech intelligibility of different binaural auralisations of speech and noise related to headphone playback with small-screen devices. The experiment involved a speech-in-noise test, whereby subjects had to identify target words, in this case - letter-number pairs - in spoken sentences played simultaneously with either speech-shaped noise (SSN) or speech-modulated noise (SMN). Colated correct word scores converted to word recognition percentages then acted as a proxy for quantifying speech intelligibility for the different conditions tested. Spreadsheet includes: * Read Me page - including an overview of the independent variables; * Raw data collected: letter-number combinations entered by subjects into a graphical user interface for each trial; * Correct word scores for each trial; * Scores summed by subject and combination of conditions, then converted to ratios and percentages; * Criterion checks for use of 3-way RMANOVA; * Statistical analyses using 3-way RMANOVA and post-hoc pairwise comparisons; * Further analyses, including quantified intelligibility of the 16 different speakers' content from the GRID speech corpus used. 
Type Of Material Database/Collection of data 
Year Produced 2022 
Provided To Others? Yes  
URL https://salford.figshare.com/articles/dataset/Speech-To-Screen_Listening_Experiment_Data_and_Statist...
 
Title VISR - Versatile Interactive Scene Renderer 
Description The VISR framework is a general-purpose software framework for general-purpose audio processing that is well-suited for multi-channel, spatial and object-based audio. It is an extensible, modular, and portable software framework this is currently being released under an open-source licence. Target audiences are researchers in audio and related disciplines, e.g. audiology. The VISR differs from existing software products in several ways: Firstly, it is well-suited for integration into other software environments, e.g., digital audio workstations or graphical programming languages as Max/MSP, in order to make functionality implemented in VISR available to a wider group of researchers, creatives, or enthusiasts. Secondly, a thorough integration of the Python language enables easy prototyping and adaptation of audio processing systems and makes it accessible to a wider group of users. Thirdly, it enables algorithm design in traditional environments as Matlab or Python and their realtime implementation using the same code base, which has the potential to streamline the workflow of many audio research and development tasks. 
Type Of Material Computer model/algorithm 
Year Produced 2018 
Provided To Others? Yes  
Impact * Used for the rendering and for the subjective of most achievements within the S3A project (including radio drama scenes and the Media Device Orchestration (MDO) technology). * Used as the DSP and development platform of the Transaural sound bar technology (S3A ans Soton Audio Labs). * Used in art installations (e.g., The Trembling Line), science fairs, open days. * Forms technical basis for collaboration between BBC R&D and S3A/University of Southampton 
URL http://cvssp.org/data/s3a/public/VISR
 
Description AudioScenic 
Organisation Audioscenic
Country United Kingdom 
Sector Private 
PI Contribution Advisor/collaboration on spatial audio research and development for personalised media
Collaborator Contribution Participation in industry events
Impact Research advice in spatial audio
Start Year 2021
 
Description BBC Research and Development 
Organisation British Broadcasting Corporation (BBC)
Country United Kingdom 
Sector Public 
PI Contribution Research in Computer Vision for broadcast production and Audio. Technologies for 3D production, free-view point video in sports, stereo production from monocular cameras, video annotation Member of the BBC Audio Research Partnership - developing the next generation of broadcast technology.
Collaborator Contribution In kind contribution (members of Steering/Advisory Boards) Use of the BBC lab and research/development facilities. Studentships (industrial case) funding and co-supervision of PhD students.
Impact Multi-disciplinary collaboration involves Computer Vision, Video Analysis, Psychoacoustics, Signal Processing and Spatial Audio
 
Description Bang and Olufsen 
Organisation Bang & Olufsen
Country Denmark 
Sector Private 
PI Contribution Spatial audio research (POSZ and S3A EPSRC funded projects)
Collaborator Contribution Scholarships (fees and bursaries) for EU/Home students. In-kind contribution by members of B&O Research department (Soren Bech, member of Steering /Advisory Boards and co-supervisor of funded students). Use of research facilities at their labs in Denmark
Impact Publications listed on http://iosr.uk/projects/POSZ/ Multi-disciplinary Collaboration: Signal Processing, Psychoacoustics and Spatial audio
 
Description Sony Broadcast and Professional Europe 
Organisation SONY
Department Sony Broadcast and Professional Europe
Country United Kingdom 
Sector Private 
Start Year 2004
 
Title BST - Binaural Synthesis Toolkit 
Description The Binaural Synthesis Toolkit is a modular and open-source package for binaural synthesis, i.e., spatial audio reproduction over headphones or transaural loudspeaker systems. It supports different reproduction methods (dynamic HRIR synthesis, HOA-based rendering, and BRIR-based virtual loudspeaker rendering), dynamic head tracking, and current data formats as SOFA. It is based on the VISR framework and implemented in Python, which means that the code is relatively accessible and open to adaptations and extensions, as well as being reusable in larger audio processing algorithms. The BST is provided to foster reproducible audio research. It is mainly targeted at researchers in sound reproduction and perception, but it could be used by enthusiasts as well. 
Type Of Technology Software 
Year Produced 2018 
Open Source License? Yes  
Impact * AES convention paper: Andreas Franck, Giacomo Costantini, Chris Pike, and Filippo Maria Fazi, "An Open Realtime Binaural Synthesis Toolkit for Audio Research," in Proc. Audio Eng. Soc. 144th Conv., Milano, Italy, 2018, Engineering Brief. 
URL http://cvssp.org/data/s3a/public/
 
Title VISR - Versatile Interactive Scene Renderer 
Description The VISR framework is a general-purpose software framework for general-purpose audio processing that is well-suited for multi-channel, spatial and object-based audio. It is an extensible, modular, and portable software framework this is currently being released under an open-source licence. Target audiences are researchers in audio and related disciplines, e.g. audiology. The VISR differs from existing software products in several ways: Firstly, it is well-suited for integration into other software environments, e.g., digital audio workstations or graphical programming languages as Max/MSP, in order to make functionality implemented in VISR available to a wider group of researchers, creatives, or enthusiasts. Secondly, a thorough integration of the Python language enables easy prototyping and adaptation of audio processing systems and makes it accessible to a wider group of users. Thirdly, it enables algorithm design in traditional environments as Matlab or Python and their realtime implementation using the same code base, which has the potential to streamline the workflow of many audio research and development tasks. 
Type Of Technology Software 
Year Produced 2018 
Open Source License? Yes  
Impact * Used for the rendering and for the subjective of most achievements within the S3A project (including radio drama scenes and the Media Device Orchestration (MDO) technology). * Used as the DSP and development platform of the Transaural sound bar technology (S3A ans Soton Audio Labs). * Used in art installations (e.g., The Trembling Line), science fairs, open days. * Forms technical basis for collaboration between BBC R&D and S3A/University of Southampton 
URL http://cvssp.org/data/s3a/public/VISR
 
Company Name AUDIOSCENIC LIMITED 
Description Soundbar technology based on S3A object based audio 
Year Established 2017 
Impact The company is classified as "information technology consultancy activities" (SIC: 62020), "business and domestic software development" (Standard Industrial Classification code: 62012), "manufacture of consumer electronics" (Standard Industrial Classification code: 26400). There was a change of name on 2018-11-08 and their previous name was Soton Audio Labs Limited.
Website https://futureworlds.com/audioscenic/
 
Description 3rd UK - Korea Focal Point Workshop conjunction with ACM Multimedia 2018 (22-27 October 2018) British Embassy Seoul, UK Science & Innovation Network - Intelligent Virtual Reality: Deep Audio-Visual Representation Learning for Multimedia Perception and Reproduction 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact The workshop was a good opportunity to bring together leading experts in audio processing and computer vision, and bridge the gap between two research fields in multimedia content production and reproduction. It was not only between the UK and Korean research group but accommodated various group from the world through the international conference.

- The soundbar demo was given to the 4DPlex Innovation team, to the R&D team, and to part of their executive team. 4DPlex was impressed by the quality of the demo, and suggested to stay in touch for further conversations about the commercial development of the technology in a cinema application.

- After the demonstration and meeting and ProGate, ProGate and Soton Audio Labs at Southampton are going to sign a contract in which Progate will act as sales representative of Soton Audio Labs to liaise in front of Korean consumer electronics companies.

- Joint publication: Changjae Oh, Bumsub Ham, Hansung Kim, Adrian Hilton and Kwanghoon Sohn, "OCEAN: Object-Centric Arranging Network for Self-supervised Visual Representations Learning," Expert Systems With Applications, Submitted in June 2018

Potential Applications of Collaboration Results

- Loudspeaker arrays for cinema surround sound system on the 4D cinema system by CJ 4DX

- Listener adaptive laptop loudspeaker array system for game industry

- VR system with immersive 3D visual contents and spatial audio with KIST
Year(s) Of Engagement Activity 2018
 
Description AES - Good vibrations bringing Radio Drama to life - Eloise Whitmore 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Professional Practitioners
Results and Impact AES in Cambridge on Wed 16th Dec. Talk by Eloise Whitmore about radio drama and S3A cutting edge production methods such as object based audio and 3D sound design.
Year(s) Of Engagement Activity 2015
URL http://www.aes-uk.org/forthcoming-meetings/good-vibrations-bringing-radio-drama-to-life/
 
Description AES Convention Berlin 2017 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Various papers presented incl. "Acoustic Room Modelling Using a Spherical Camera for Reverberant Spatial Audio Objects". During the poster sessions there was also opportunity for networking and interaction with other practitioners to spread the word about the technologies being developed by S3A. Paper accesible at http://epubs.surrey.ac.uk/id/eprint/813849
Year(s) Of Engagement Activity 2017
URL http://www.aes.org/events/142/
 
Description Audio Mostly 2017 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Audio Mostly is an interdisciplinary conference on design and experience of interaction with sound t embracing applied theory and reflective practice. Participants bring together thinkers and doers from academia and industry that share an interest in sonic interaction and the use of audio for interface design. All papers are peer reviewed and published in the ACM Digital Library. S3A participated with the demo "Media device orchestration for immersive spatial audio" and was voted runner-up for best demo award http://audiomostly.com/conference-program/awards/
Year(s) Of Engagement Activity 2017
URL http://audiomostly.com/conference-program/awards/
 
Description Aura Satz: The Trembling line exhibition 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Professional Practitioners
Results and Impact The Trembling Line is an exhibition by Aura Satz exploring acoustics, vibration, sound visualisation and musical gesture with an aim to wrest the space between sound and image to see how far these can be stretched apart before they fold back into one another. .
The centrepiece of the show is the film and sound installation The Trembling Line, which explores visual and acoustic echoes between decipherable musical gestures and abstract patterning, orchestral swells and extreme slow-motion close-ups of strings and percussion. It features a score by Leo Grant and an innovative multichannel audio system by the Institute of Sound and Vibration Research (ISVR), University of Southampton, as part of the S3A research project on immersive listening.
Year(s) Of Engagement Activity 2015,2016
URL http://www.hansardgallery.org.uk/event-detail/199-aura-satz-the-trembling-line/
 
Description BBC Sound now and next 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Industry/Business
Results and Impact The S3A Programme Grant was represented at the Technology Fair with Demos (ar per list below) and the Radio Drama production (James Woodcock) which was showcased in the BBC Demo Room.
- Towards perceptual metering for an object-based audio system [Dr Jon Francombe, University of Surrey and Yan Tang, University of Salford]
- 3D Head tracking for Spatial Audio and Audio-Visual Speaker tracking [Dr Teo de Campos, University of Surrey and Dr Marcos Simon Galvez, University of Southampton]
- Headphone simulation of 3D spatial audio systems in different listening environments". [Dr Rick Hughes, University of Salford and Chris Pike, BBC]
The demos and radio drama generated significant attention from other attendees/external organisations and universities. The S3A Advisory Steering Board commended S3A for the rapid progress and the impact of the demos.
Year(s) Of Engagement Activity 2015
URL http://www.bbc.co.uk/rd/blog/2015-06-sound-now-next-watch-talks-online
 
Description BBC Sounds Amazing 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Industry/academic forum for research and production industry professionals in audio and sound hosted by the BBC
Year(s) Of Engagement Activity 2021,2022
URL https://www.bbc.co.uk/academy/events/sounds-amazing-2022/
 
Description BBC live streamed event - Opera passion day 
Form Of Engagement Activity A broadcast e.g. TV/radio/film/podcast (other than news/press)
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact The S3A project set up and ran equipment for a "Can a soprano singer shatter glass?" experiment (breaking wine glass with opera singer's voice) for a live streamed BBC Tomorrow's World feature as part of #OperaPassion Day (https://www.bbc.co.uk/events/epdgfx/live/cvwbj5). This was held at Manchester's Museum of Science and Industry and was streamed live on the BBC events page and BBC Facebook, with video footage later featured on the front page of the main BBC website. The intention was to engage with the public on the physics of sound and the human voice, with total online views for the whole #OperaPassion Day event exceeding 0.5 million.
Year(s) Of Engagement Activity 2017
URL https://www.bbc.co.uk/events/epdgfx/live/cvwbj5
 
Description CVPR 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Industry/Business
Results and Impact Primary international forum for computer vision and AI/machine learning research in audio-visual media. Research dissemination through papers, key-note invited talks and workshop organisation
Year(s) Of Engagement Activity 2021,2022,2023
URL https://cvpr2023.thecvf.com
 
Description CVSSP 30th Anniversary Celebration 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Industry/Business
Results and Impact Centre for Vision, Speech and Signal Processing (CVSSP) 30th Anniversary Celebration with over 500 participants from industry, government and alumni. The event themed on 'Can Machines Think' included a series of key-note talks from alumni who are international leaders in academia and industry, over 30 live demos of current research, and an open house at the centre for both industry and guests. There was also a VIP dinner hosted by the Vice-Chancellor of the University.
Year(s) Of Engagement Activity 2019
URL http://surrey.ac.uk/cvssp
 
Description Camp Bestival 2017 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Public/other audiences
Results and Impact Camp Bestival is a large family-oriented music festival taking place in Lulworth, Dorset each year over 4 days. The science tent hosts a variety of shows and demonstrations aimed at educating and inspiring children. We presented two audio demos: Binaural dummy head, and a soundbar. Participants listened on headphones while sounds were made around the head, and the soundbar beamed 3 different streams of music in different directions. Despite challenging listening conditions the demos were extremely well received, and will certainly have triggered the curiosity of some young minds.
Year(s) Of Engagement Activity 2017
URL https://www.festicket.com/festivals/camp-bestival/2017/
 
Description DAFx Conference Edinburgh International Conference on Digital Audio Effects 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact S3A's object-based reverberation at the International Conference on Digital Audio Effects (DAFx) 2017, Edinburgh. S3A was invited to present demonstrations at 5 poster sessions throughout the conference, leading to discussion/networking with international industry (Apple, HTC, Dolby, Ircam, Magic Leap) and academic contacts (TU Köln, Aalto, York, Audio Labs (Erlangen)). This engagement has paved the way for future impact around the reverberation approach used in S3A.
Year(s) Of Engagement Activity 2017
URL http://www.dafx17.eca.ed.ac.uk/
 
Description EUSIPCO 2017 European Signal Processing Conference 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Opportunity for S3A to be highlighted amongst colleagues in the audio signal processing and machine learning field. The paper findings generated special interest on the subject of "Any theory support such as an exact mathematical model to the proposed perceptual model in speech enhancement".
Year(s) Of Engagement Activity 2017
URL https://www.eusipco2017.org/
 
Description European Conference on Visual Media Production 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Presentation of research advances at industry-academic forum
Year(s) Of Engagement Activity 2021,2022
URL https://www.cvmp-conference.org/
 
Description Exhibition at the Consumer Electronics Show, Vegas 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact S3A launched a laptop-friendly 3D stereo sound bar and the world's first multipurpose disposable medical tests at the global showcase in Las Vegas, where the Future Worlds Accelerator is set to be the only UK university exhibitor for a fourth consecutive year.
Year(s) Of Engagement Activity 2019
URL https://www.s3a-spatialaudio.org/s3a-at-ces-2019
 
Description Interview on BBC Radio 4 
Form Of Engagement Activity A broadcast e.g. TV/radio/film/podcast (other than news/press)
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact Interview with Eddie Mair to explain the concepts and technology being developed by S3A - Specially re. speech intelligibility.
Year(s) Of Engagement Activity 2018
URL https://www.bbc.co.uk/programmes/b09tc4q3
 
Description Invited guest Lecture, Limerick Institute of Technology Ireland 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Guest lecture at Limerick Institute of Technology, Ireland, to an audience of undergrad and postgrad students, departmental staff etc. in the field of music production and broadcast engineering. The audience showed interest in the work, particularly narrative importance and MDO, and a number wished to be contacted about future studies and online surveys.
Year(s) Of Engagement Activity 2018
 
Description Juice Audio Developer Conference 2018 - London 19-21 Nov 2018 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Head-tracked object-based binaural demonstration using the VISR Production Suite: our suite of DAW plugins for producing and reproducing object-based audio.
Attendees were from mostly from audio industries and audio software companies (e.g. Dolby, Steinberg, Vienna Symphonic Library, etc. ), and some from Universities (UWE, Bristol University, etc.). People willing to try out Open-source software.
Year(s) Of Engagement Activity 2018
 
Description Presentation at Conference LVA ICA held at the University of Surrey, July 2018 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Audio Day workshop as part of the LVA ICA 2018 Conference. Including presentations by S3A Researchers.
e.g Presentation "Deep Learning for Speech Separation" (Qingju Liu)
Year(s) Of Engagement Activity 2018
URL https://www.surrey.ac.uk/events/20180702-lvaica-2018-14th-international-conference-latent-variable-a...
 
Description S3A visit to Parma University - Casa della Musica 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact As part of the collaboration with Parma University and Casa della Musica, S3A hosted a classical concert open to the general public. The aim of the event was to record the concert using different microphone arrays and S3A technology as well as 360 video. The concert was organised in collaboration with Parma University and the Conservatorio Arrigo Boito. The recording is being used for further research.
Year(s) Of Engagement Activity 2017
URL http://www.comune.parma.it/notizie/news/CULTURA/2017-01-12/Progetto-S3A-Audio-spaziale-il-meeting-in...
 
Description Soundbar Technology market research 
Form Of Engagement Activity A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Industry/Business
Results and Impact As part of the ICURe programme, Dr Marcos Simon Galvez has had the opportunity to discuss his research on soundbars and further use of the technology with industry (nationally and internationally).
Year(s) Of Engagement Activity 2016,2017
 
Description TAUNTON Stem Festival 
Form Of Engagement Activity Participation in an open day or visit at my research institution
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Schools
Results and Impact S3A presented an object-based sound system with S3A technology and interactive content at a major STEM festival in Taunton. The event targeted primary and secondary school pupils. The event was covered by local press.
Year(s) Of Engagement Activity 2016
 
Description UK- Korea Focal Point Workshop in Seoul, Korea / Visit research institutions in South Korea 
Form Of Engagement Activity Participation in an open day or visit at my research institution
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact 7 members in S3A visited South Korea for the 2nd UK-Korea Focal Point Workshop on Audio-Visual representation in January 2018. The main workshop was held at Yonsei university and around 60 people have attended. 4 members gave presentations about the S3A research topics at the workshop. Also visited CJ 4DX, KIST (Korea Institute Science and Tech.) and Korea university to establish new links for future research collaboration. There were several immediate areas for collaboration found and future application for collaborative research funding. Complementary research strengths in immersive media and AI are particularly strategic given the UK industrial strategy initiatives in this area. Plans for a further workshop in Seoul in this Autumn conjunction with ACM Multimedia conference has been built on the links established in this visit.
Year(s) Of Engagement Activity 2018
URL http://ee.yonsei.ac.kr/ee_en/community/academic_notice.do?mode=view&articleNo=22313&article.offset=0...
 
Description University of Surrey - Festival of Wonder 
Form Of Engagement Activity Participation in an open day or visit at my research institution
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Public/other audiences
Results and Impact The S3A's project Sound Sphere was installed at the University of Surrey's 50th year celebration "Festival of Wonder". A surround sound version of the S3A Autumn Forest radio drama was played, an the audience were able to interact with the content by moving the narrator position on an ipad.
Year(s) Of Engagement Activity 2017
URL https://www.youtube.com/watch?v=fhRuz7q4XX0
 
Description Vostok-K demonstration at Manchester Science Festival 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Public/other audiences
Results and Impact Demonstration of MDO and The Vostok-K Incident in the aerospace hall at the Science and Industry Museum in Manchester as part of Manchester Science Festival. Visitors were introduced to the concept of MDO and given a 'live' demonstration of the Taster experience.
Year(s) Of Engagement Activity 2018
 
Description Winchester Cathedral Primary Science Festival 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Schools
Results and Impact An acoustic workshop lasting over 50 minutes was undertaken with 6 groups of 16 primary school children aged 9-11, as part of a Science Festival at Winchester Cathedral (Nov 16). The workshop was carried out by Steve Elliott and Marcos Simon Galvez and it covered activities to do with how sound travels and its speed, length and pitch in musical instruments and reverberation and localisation. The last activity involved live recordings from a dummy head to multiple headphones that the students listened to, in order to demonstrate binaural sound localisation. The feedback received from teachers was that this event had helped increase the students' knowledge of acoustic and their perception of science and engineering.
Year(s) Of Engagement Activity 2016
 
Description Workshop on Intelligent Music Production, Huddersfield 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Professional Practitioners
Results and Impact Presentation and demonstration of new media device orchestration content (The Vostok-K Incident) to workshop attendees, including academics, postgraduates, and people from industry. The workshop was on the day after the official BBC Taster launch of the content.
Year(s) Of Engagement Activity 2018