Autonomous System for Sound Integration and GeneratioN (ASSIGN)

Lead Research Organisation: Queen Mary University of London
Department Name: Sch of Electronic Eng & Computer Science

Abstract

In sound design for immersive media and games, the designer typically starts by searching a large sound effects library for the desired sounds. The project team have delivered a highly successful InnovateUK feasibility project demonstrating that existing sound effects libraries can be replaced by sound synthesis techniques that use software algorithms to generate the desired sounds, enabling high quality sound effects across all forms of content creation. But the biggest challenge for professional sound design is the effort required to integrate sound effects into the timeline and story, and synchronise them with video or animation content.
The Autonomous Systems for Sound Integration and GeneratioN (ASSIGN) project will deliver and validate a prototype sensor-driven sound synthesis service, capable of autonomous decision-making for use by anyone wanting to enhance or interact with sounds. It will allow synthesis of any sound, with integration into existing workflows,.
The ASSIGN system generates sounds with their context from other sensor data. It uses animation storyboarding and visual object recognition information to automatically synthesize sounds and effects with their correct to drive sound generation, placement and perspective, thus enabling new forms of interaction.
By exploiting sensors to generate sounds and their context, we give intelligence to the sound effects generation. This fits nicely with computer graphics approaches, where much of the animation is driven by some high level information, e.g., if a man drops a glass, we see it falling in the virtual world of the game, film or augmented reality. The animation is a property of the object, and sound effects should follow this same paradigm, thus enabling synchronisation of sound effects with CGI in immersive, game, film and augmented reality design.
ASSIGN has the potential to revolutionize the sound design process for production of film, TV and animation content. By integrating such technologies with intuitive cloud media production tools developed by the lead partner, RPPtv, we can deliver a novel, sensor-driven SFX synthesis service. It democratises the industry, giving anyone the ability to become a sound designer, giving users control, harnessing their creativity.
Initial work by the academic partner in this area has shown great promise. A wide range of sounds, including sound textures, soundscapes and impulse sounds covering most popular sound effects can be synthesized in real-time with high quality and relevant user controls. But significant research questions remain unanswered;

- How rich and relevant are available metadata for control and placement of synthesized sound effects?
- How effective are state-of-the-art object and scene recognition and object tracking methods for extracting details of an acoustic environment?
- How versatile are the sound synthesis models?
- How can the parameters of sound synthesis models be mapped to intuitive controls?
- How should autonomous sound effect generation and integration methods be evaluated?

The project is industry-led, with an experienced academic partner and validated with industry users. It exploits the outputs of a highly successful InnovateUK Feasibility Study and groundbreaking research from Queen Mary University of London's audio engineering team. It brings together knowledge, skills and technologies from immersive and games audio production, cloud service delivery and academic research excellence. A demonstrator of the service will be evaluated and validated in audio post-production with immersive, object-based and 3D audio specialists Gareth Llewelyn Sound. The business potential is compelling, since the project will demonstrate a disruptive cloud service for a globally used and purchased media resource. The outputs will include a prototype for CGI-linked SFX production, with full market analysis, business models and road map to launch a commercial service.

Planned Impact

Economic Impacts
This project will result in a new digital autonomous system for creating and delivering sound effects to multiple creative sectors, saving time and money in the content production process. Long term, it enables a full suite of production tools using generative techniques, for commercial deployment in a wide variety of media. A wide range of players from a diversity of markets spanning audio post-production and immersive content creation, game audio, security and health monitoring will be able to offer high quality and perceptually relevant sound generation as part of their products and applications. These impacts will come through both existing and newly developed commercial relationships between such companies and customers and the project partners. In particular, the academic partner has frequent contact with audio production companies through his leading role in the Audio Engineering Society. Additional impact routes will come through exhibition at trade shows such as the IBC, and a focused workshop at Pinewood Studios, showcasing the system to film production professionals.
The research provides a stimulus to the UK visual and sound effects industries by lowering barriers to entry and providing access to scaleable resources. It enables new jobs in an industry sector which traditionally recruits young people and increases the role of the UK in the worldwide entertainment content creation and production workflow and sound synthesis service sectors.
The research will also impact the wider public in the long term by addressing the needs of the market for amateur and homemade media productions. This market is vast. Over 300 hours of new content is uploaded to youtube every minute. The overwhelming majority of this is amateur and would benefit from intelligent and intuitive sound design technologies.

Social Impacts
The project will place greater emphasis on creativity in the sound effects value chain, which will stimulate the need for and value of a highly skilled workforce. It will result in generation of high quality, more interesting and less repetitive content. It enables faster production, less manual labour, and less rote technical work, thus empowering the amateur producer or content creator.
The autonomous system can be used for environment-aware and context appropriate sound generation, e.g., warning and alert sounds for security and medical applications, engine sounds for electric car safety, as well as soothing ambient sounds for various settings. It will also result reduce energy consumption and CO2 emissions through deployment of services within centralized low power data centres.

Partner Impacts
Lead partner RPPtv will enter new markets and enhance existing services. They benefit from the transfer of state-of-the-art academic research on vision object recognition, sound synthesis, AI and story board intelligent systems into their product range, thus reinforcing their position as a leading provider of multimedia technology. Mixed Immersion Ltd., representing user communities, will benefit from faster and higher quality sound effect production, and will utilize this in their commercial projects. QM will promote their research excellence to a wide community and engage with a wide range of beneficiaries. QM will benefit from commercial impact arising from their activities, and the QM researcher will gain skills in knowledge transfer and application of research technology in a commercial setting.

Additional Impacts
Academic research in the domain of sound synthesis will benefit from access to a real-life platform and real-life data to evaluate novel approaches. This project will set a precedent for best practice in sound synthesis evaluation, optimization and control, and the transfer of technology between academia and industry in this field.

Publications

10 25 50
publication icon
Moffat D (2018) Perceptual Evaluation of Synthesized Sound Effects in ACM Transactions on Applied Perception

 
Description We launched a spin-out company based on the research performed in this grant, FXive Ltd. This was closed down in 2020, and we then launched a new spin-out Nemisindo Ltd. related to QMUL's research contribution.
First Year Of Impact 2018
Sector Creative Economy
Impact Types Economic

 
Description InnovateUK submission - ECOBAP 
Organisation University of Salford
Country United Kingdom 
Sector Academic/University 
PI Contribution Building on the ASSIGN project, QMUL partnered with RPPTV and U. Salford to submit an InnovateUK proposal. This represented a new collaboration between the QMUL and Salford teams.
Collaborator Contribution Meetings, grant writing, consulting...
Impact An unsuccessful InnovateUK proposal, though it might be revised and resubmitted.
Start Year 2017
 
Title Sound effect synthesis 
Description The application describes; a sound synthesis system for generating a user defined synthesised sound effect, the system comprising: a receiver of user defined inputs for defining a sound effect; ? a generator of control parameters in dependence on the received user defined inputs; ? a plurality of sound effect objects, wherein each sound effect object is arranged to generate a different class of sound and each sound effect object comprises a sound synthesis model arranged to generate a sound in dependence on one or more of the control parameters; a plurality of audio effect objects, wherein each audio effect object is arranged to receive a sound from one or more sound effect objects and/or one or more other audio effect objects, process the received sound in dependence on one or more of the control parameters and output the processed sound; and a scene creation function arranged to receive sound output from one or more sound effect objects and/or audio effect objects and to generate a synthesised sound effect in dependence on the received sound. 
IP Reference GB1719854.0 
Protection Patent application published
Year Protection Granted 2017
Licensed No
Impact We filed a patent on the core technology. This is a key milestone towards commercialisation.
 
Title RTSFX : Real-time sound effects synthesis 
Description RTSFX is a real-time sound effect synthesis framework in the browser developed at Queen Mary University of London. The system is comprised of a multitud of synthesis models, with selected post-processing tools (audio-effects, temporal and spatial placement, etc), for users to create scenes from scratch. Each sound of these models can generate sound real-time, allowing the user to manipulate multiple parameters and shape the sound in different ways. 
Type Of Technology Webtool/Application 
Year Produced 2018 
Impact This is the main technology developed in the project 
URL http://c4dm.eecs.qmul.ac.uk/audioengineering/RTSFX/
 
Company Name NEMISINDO LIMITED 
Description Nemisindo offers innovative sound design services based on our procedural audio technology 
Year Established 2020 
Impact The company was recently awarded a $125k contract from Epic Games to provide procedural audio for the Unreal game engine.
Website https://nemisindo.com
 
Description Presentation and pitch to 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Industry/Business
Results and Impact We pitched the technology at the Digital Investment Showcase. This resulted in several serious inquiries from potential investors, an important step towards commercialisation.
Year(s) Of Engagement Activity 2017