AI for Sound

Lead Research Organisation: University of Surrey
Department Name: Vision Speech and Signal Proc CVSSP


Imagine you are standing on a street corner in a city. Close your eyes: what do you hear? Perhaps some cars and busses driving on the road, footsteps of people on the pavement, beeps from a pedestrian crossing, rustling and clonks from shopping bags and boxes, and the hubbub of talking shoppers. You can do the same in a kitchen as someone is making breakfast, or as you are working in a busy office. Now, following the successful application of AI and machine learning technologies to the recognition of speech and images, we are beginning to build computer systems to tackle the challenging task of "machine listening", to build computer systems to automatically analyse and recognize everyday real-world sound scenes and events.

This new technology has major potential applications in security, health & wellbeing, environmental sensing, urban living, and the creative sector. Analysis of sounds in the home offers the potential to improve comfort, security, and healthcare services to inhabitants. In environmental sound sensing, analysis of urban sounds offers the potential to monitor and improve soundscapes experienced for people in towns and cities. In the creative sector, analysis of sounds also offers the potential to make better use of archives in museums and libraries, and production processes for broadcasters, programme makers, or games designers. The international market for sound recognition technology has been forecast to be worth around £1bn by 2021, so there is significant potential for new tools in "AI for sound" to have a major benefit for the economy and society.

Nevertheless, realising the potential of computational analysis of sounds presents particular challenges for machine learning technologies. For example, current research use cases are often unrealistic; modern AI methods, such as deep learning, can produce promising results, but are still poorly understood; and current datasets may have unreliable or missing labels.

To tackle these and other key issues, this Fellowship will use a set of application sector use cases, spanning sound sensing in the home, in the workplace and in the outdoor environment, to drive advances in core machine learning research.

Specifically, the Fellowship will focus on four main application use cases: (i) monitoring of sounds of human activity in the home for assisted living; (ii) measuring of sounds in non-domestic buildings to improve the office and workplace environment; (iii) measuring sounds in smart cities to improve the urban environment; and (iv) developing tools to use sounds to help producers and consumers of broadcast creative content.

Through this Fellowship, we aim to deliver a step-change in research in this area, bringing "AI for Sound" technology out of the lab, helping to realize its potential to benefit society and the economy.

Planned Impact

The proposed research has the potential to benefit the UK and international economy and society through machine recognition of sounds as a key enabling technology. The market for sound recognition technology has been forecast to be worth around £1bn internationally by 2021 (Jeronimo. Driving New Revenue Streams from Intelligent Devices through Sound Recognition, IDC, Dec 2017), and the recent DCASE workshops in 2017 and 2018 have attracted around 40% industry representation. The UK acoustics industry has a turnover of £4.6bn across 750 companies (UK Acoustics Network. UK Acoustics: Sound Economics. March 2019). Acoustics is relevant to many industry sectors, from aerospace and automotive to consumer goods and non-destructive testing, with significant potential for impact by new tools in "AI for Sound".

Example potential impacts from different sectors are given below.

Commercial private sector:
* Providers of remote health and social care, through new methods to use sound sensing to assist people to live independently for longer;
* Internet-of-things companies who supply smart buildings and smart cities with networked sensor systems, through access to novel acoustic algorithms for more sophisticated mapping.
* Acoustic consultants, through access to new ways of mapping and understanding soundscapes, which will help drive new design possibilities for the built environment.
* Commercial companies requiring sound sensing, through access to new audio research;
* Television and radio companies, through ability to use sound data exploration technologies in the creation, editing and re-use of audio and audiovisual programmes.
* Computer games companies, through new ways to reuse sound datasets creatively for new game sounds;
* Audio archiving companies, through access to the latest algorithms and methods for annotating and exploring sound archives;
* Musicians, composers and sound artists, through ways to find and explore new sounds as part of their creative output;

Policy-makers and others in government and government agencies:
* Smart cities, through better ways to make sense of acoustic data and improve urban soundscapes;
* Urban planning authorities, through new insights into the impact of sounds and how to visualize and understand these impacts;
* Environmental monitoring agencies, through new measurements of sound impact offering the potential to develop new noise policies and so improve wellbeing of citizens;

Public sector, third sector and others:
* Museums and other organizations with sound archives, through new software methods to allow people to explore and use their archives;
* Science promotion organizations, in particular through outputs from the projects on how people perceive and navigate sounds.
* Environmental organizations, through new ways to monitor biodiversity;

Wider public:
* People living with dementia and others in need of assisted living to continue to live at home, through new and simpler monitoring methods enabled by sound sensing.
* People working in office workplaces, through new tools to measure impact of sound leading to new designs of workplace soundscapes.
* People living in urban environments, through improved city sound and noise policies and better designed soundscapes, making the urban environment more pleasant;
* Audiences of creative output involving audio and music, through availability of new creative outputs facilitated by creative access to new sounds.
* People interested in exploring audio recordings at home, school, college or university, either for educational or general interest purposes;
* Teachers in schools, colleges or universities who want to use sound examples for teaching audio or music;

Researchers employed on the project:
* Improved skills in research methodologies, which may be transferred into the commercial sector on completion of the project.

For specific plans for the realisation of impact, see "Pathways to Impact".
Description Invited talk to the Association of Noise Consultants, Aug 2020 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Invited talk on "Artificial Intelligence for Sound" to the Association of Noise Consultants, 27 Aug 2020 (Video meeting).
Year(s) Of Engagement Activity 2020
Description Spotlight talk at The Turing Presents: AI UK 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Industry/Business
Results and Impact Presenting a Spotlight Talk on "AI for Sound" at the online The Turing Presents: AI UK, 23-24 March 2021, A showcase featuring UK academic work in AI and machine learning.
Year(s) Of Engagement Activity 2021