Making Sense of Sounds

Lead Research Organisation: University of Surrey

Department Name: Vision Speech and Signal Proc CVSSP

Abstract

In this project we will investigate how to make sense from sound data, focussing on how to convert these recordings into understandable and actionable information: specifically how to allow people to search, browse and interact with sounds.

Increasing quantities of sound data are now being gathered in archives such as sound and audiovisual archives, through sound sensors such as city soundscape monitoring and as soundtracks on user-generated content. For example, the British Library (BL) Sound Archive has over a million discs and thousands of tapes; the BBC has some 1 million hours of digitized content; smart cities such as Santander (Spain) and Assen (Netherlands) are beginning to wire themselves up with a large number of distributed sensors; and 100 hours of video (with sound) are uploaded you YouTube every minute.

However, the ability to understand and interact with all this sound data is hampered by a lack of tools allowing people to "make sense of sounds" based on the audio content. For example, in a sound map, users may be able to search for sound clips by geographical location, but not by "similar sounds". In broadcast archives, users must typically know which programme to look for, and listen through to find the section they need. Manually-entered textual metadata may allow text-based searching, but these typically only refer to the entire clip or programme, can often be ambiguous, and are hard to scale to large datasets. In addition, browsing sound data collections is a time-consuming process: without the help of e.g. key frame images available from video clips, each sound clip has to be "auditioned" (listened to) to find what is needed, and where the point of interest can be found. Radio programme producers currently have to train themselves to listen to audio clips at up to double speed to save time in the production process. Clearly better tools are needed.

To do this, we will investigate and develop new signal processing methods to analyse sound and audiovisual files, new interaction methods to search and browse through sets of sound files, and new methods to explore and understand the criteria searchers use when searching, selecting and interacting with sounds. The perceptual aspect will also investigate people's emotional response to sounds and soundscapes, assisting sound designers or producers to find audio samples with the effect they want to create, and informing the development of public policy on urban soundscapes and their impact on people.

There are a wide range of potential beneficiaries for the research and tools that will be produced in this project, including both professional users and the general public. Archivists who are digitizing content into sound and audiovisual archives will benefit from new ways to visualize and tag archive material. Radio or television programme makers will benefit from new ways to search through recorded programme material and databases of sound effects to reuse, and new tools to visualize and repurpose archive material once identified. Sound artists and musicians will benefit from new ways to find interesting sound objects, or collections of sounds, for them to use as part of compositions or installations. Educators will benefit from new ways to find material on particular topics (machines, wildlife) based on their sound properties rather than metadata. Urban planners and policy makers will benefit from new tools to understand the urban sound environment, and people living in those urban environments will benefit through improved city sound policies and better designed soundscapes, making the urban environment more pleasant. For the general public, many people are now building their own archives of recordings, in the form of videos with soundtracks, and may in future include photographs with associated sounds (audiophotographs). This research will help people make sense of the sounds that surround us, and the associations and memories that they bring.

Planned Impact

Potential beneficiaries of this project outside of the academic research community include anyone who could benefit from new ways to explore sound and audiovisual data, or could benefit from access to the sounds that would be enabled by the research. Examples from different sectors are given below.

Commercial private sector:
* Commercial companies designing audio equipment, through easier access to new audio research;
* Musicians, composers and sound artists, through ways to find and explore new sounds as part of their creative output;
* Computer games companies, through new ways to reuse sound datasets creatively for new game sounds;
* Audio archiving companies, through access to the latest algorithms and methods for annotating and exploring sound archives;
* Television and radio companies, through ability to use sound data exploration technologies in the creation, editing and re-use of audio and audiovisual programmes.
* Acoustic consultants, through access to new ways of mapping and understanding soundscapes, which will help drive new design possibilities for the built environment.
* Internet-of-things companies who supply smart cities with networked sensor systems, through access to novel acoustic algorithms for more sophisticated mapping.
Policy-makers and others in government and government agencies:
* Urban planning authorities, through new insights into the impact of sounds and how to visualize and understand these impacts;
* Research funders, through establishment of a network of researchers in sound data research, opening up new opportunities for valuable research, and new demonstrators showing the value of research.

Public sector, third sector and others:
* Museums and other organizations with sound archives, through new software methods to allow people to explore and use their archives;
* Smart cities, through better ways to make sense of acoustic data from urban microphone arrays;
* Science promotion organizations, in particular through outputs from the projects on how people perceive and navigate sounds.

Wider public:
* People interested in exploring audio recordings at home, school, college or university, either for educational or general interest purposes;
* People recording sounds on mobiles and other portable devices, including those capturing audio as soundtracks to videos;
* Teachers in schools, colleges or universities who want to use sound examples for teaching audio or music;
* People living in urban environments, through improved city sound policies and better designed soundscapes, making the urban environment more pleasant;
* Audiences of creative output involving audio and music, through availability of new creative outputs facilitated by creative access to new sounds.

Researchers employed on the project:
* Improved skills in research methodologies, which may be transferred into e.g. the commercial private sector on completion of the project.

Funded Value:

£1,275,401

Funded Period:

Mar 16 - Oct 19

Funder:

EPSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

EP/N014111/1

Principal Investigator:

Mark Plumbley

Research Subject:

Info. & commun. Technol. (100%)

Research Topic:

Artificial Intelligence (20%)

Digital Signal Processing (10%)

Human-Computer Interactions (20%)

Music & Acoustic Technology (50%)

Organisations

People	ORCID iD
Mark Plumbley (Principal Investigator)
David Mark Frohlich (Co-Investigator)
Krystian Mikolajczyk (Co-Investigator)
William Davies (Co-Investigator)	http://orcid.org/0000-0002-5835-7489
Wenwu Wang (Co-Investigator)
Philip J B Jackson (Co-Investigator)
Trevor Cox (Co-Investigator)

Publications

Author Name

Title Publication Date Published

|< < 1 2 3 > >|

10 25 50

Annamaria Mesaros (2017) Dcase2016 Challenge Submissions Package in Zenodo

Benetos E (2017) Polyphonic Sound Event Tracking Using Linear Dynamical Systems in IEEE/ACM Transactions on Audio, Speech, and Language Processing

Benetos E (2018) Computational Analysis of Sound Scenes and Events

Benetos E (2016) Detection of overlapping acoustic events using a temporally-constrained probabilistic model

Bones O (2016) Toward an evidence-based taxonomy of everyday sounds in Journal of the Acoustical Society of America

Bones O (2018) Sound Categories: Category Formation and Evidence-Based Taxonomies. in Frontiers in psychology

Bones O (2017) An evidence-based soundscape taxonomy

Bones O (2018) Distinct categorization strategies for different types of environmental sounds

Bones O (2017) Clang, chitter, crunch: Perceptual organisation of onomatopoeia in The Journal of the Acoustical Society of America

Campbell S (2019) Sentimental audio memories: Exploring the emotion and meaning of everyday sounds.

Cano E (2019) Musical Source Separation: An Introduction in IEEE Signal Processing Magazine

Cao Y (2019) Polyphonic Sound Event Detection and Localization using a Two-Stage Strategy

Cao Y (2021) An Improved Event-Independent Network for Polyphonic Sound Event Localization and Detection

Cao Y (2020) Event-independent network for polyphonic sound event localization and detection

Da Silva L (2023) Antibacterial potential of chalcones and its derivatives against Staphylococcus aureus. in 3 Biotech

Duel T (2018) HCI International 2018 - Posters' Extended Abstracts - 20th International Conference, HCI International 2018, Las Vegas, NV, USA, July 15-20, 2018, Proceedings, Part I

Ellis D (2018) Computational Analysis of Sound Scenes and Events

Frohlich DM (2019) Co-designing the beneficial use of everyday recorded sounds with people with dementia and their carers.

Graetzer S (2020) Continuous Evaluative and Pupil Dilation Response to Soundscapes

Grais E (2021) Multi-Band Multi-Resolution Fully Convolutional Neural Networks for Singing Voice Separation

Grais E (2019) Referenceless Performance Evaluation of Audio Source Separation using Deep Neural Networks

Grondin F (2019) Sound Event Localization and Detection Using CRNN on Pairs of Microphones

Hou Y (2019) Sound Event Detection with Sequentially Labelled Data Based on Connectionist Temporal Classification and Unsupervised Clustering

Huang Q (2017) Fast tagging of natural sounds using marginal co-regularization

Key Findings
Impact Summary
Further Funding
Research Databases and Models
Engagement Activities


Description	New methods to recognize sound scenes and events, based on methods such as deep learning and neuroevolution. New insights into human categorization of everyday sounds. New "attention"-like methods for deep learning using weak labels. New insights into the relative performance and computational cost of methods to recognize sound events.
Exploitation Route	Sound recognition for a range of applications in areas including: security, environmental monitoring, assisted living, autonomous vehicles, urban living, etc.
Sectors	Agriculture, Food and Drink,Creative Economy,Digital/Communication/Information Technologies (including Software),Environment,Healthcare,Transport
URL	http://cvssp.org/projects/making_sense_of_sounds/site/


Description	Presentations to industry, practitioner groups, and wider public such as Association of Noise Consultants, Sound Sensing in Smart Cities workshop (London), Connected Places Catapult "Third Thursday" (London), Audio Day 2018, DCASE 2018 workshop (Woking, Surrey), CVSSP 30th Anniversary (Surrey). Presentations and discussions with general public at the Cheltenham Science festival, Goosfest art & music festival (Cheshire), and New Scientist Live (London).
First Year Of Impact	2017
Sector	Creative Economy,Environment,Culture, Heritage, Museums and Collections,Transport
Impact Types	Cultural,Societal


Description	AI for Sound
Amount	£2,120,276 (GBP)
Funding ID	EP/T019751/1
Organisation	Engineering and Physical Sciences Research Council (EPSRC)
Sector	Public
Country	United Kingdom
Start	05/2020
End	04/2025


Description	Audio-Visual Media Research Platform
Amount	£1,577,223 (GBP)
Funding ID	EP/P022529/1
Organisation	Engineering and Physical Sciences Research Council (EPSRC)
Sector	Public
Country	United Kingdom
Start	08/2017
End	07/2022


Description	EPSRC UK Acoustics Network Plus
Amount	£1,418,894 (GBP)
Funding ID	EP/V007866/1
Organisation	Engineering and Physical Sciences Research Council (EPSRC)
Sector	Public
Country	United Kingdom
Start	11/2020
End	10/2024


Description	Multimodal Video Search by Examples (MVSE)
Amount	£863,564 (GBP)
Funding ID	EP/V002856/1
Organisation	Engineering and Physical Sciences Research Council (EPSRC)
Sector	Public
Country	United Kingdom
Start	09/2020
End	09/2023


Description	UK Acoustics Network
Amount	£561,807 (GBP)
Funding ID	EP/R005001/1
Organisation	Engineering and Physical Sciences Research Council (EPSRC)
Sector	Public
Country	United Kingdom
Start	11/2017
End	11/2020


Title	Challenge 2018 dataset
Description	This a collection of 2000 audio files which have been robustly classified by human assessors into top-level and then event-level categories, e,g, Human -- Laughing, Effects -- Smash. The audio files were taken from Freesound data base, the ESC-50 dataset and the Cambridge-MT Multitrack Download Library. All files have an identical format: single-channel 44.1 kHz, 16-bit .wav files. All files are exactly 5 seconds long, but may feature periods of silence, i.e. a given extract may feature a sound that is shorter than the duration of the audio file. The dataset was split into two parts and made available as development and evaluation file groups for a machine learning classification challenge. The development dataset consists of 1500 audio files divided into the five categories, each containing 300 files. The evaluation dataset consists of 500 audio files, 100 files per category. Usage details are provided with the files at: https://doi.org/10.17866/rd.salford.6901475.v4
Type Of Material	Database/Collection of data
Year Produced	2018
Provided To Others?	Yes
Impact	DCASE challenge DCASE paper ICASSP paper
URL	https://cvssp.org/projects/making_sense_of_sounds/site/challenge/


Title	MSoS EEG dataset
Description	EEG data collected from 20 participants during soundscape perception
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	No
Impact	Ongoing analysis. Aims to investigate the neural basis of soundscape perception
URL	https://my.syncplicity.com/Files/#home/1/4476650/Making%20sense%20of%20sound/EEG%20data


Description	Article for University of Salford Open Research Network newsletter
Form Of Engagement Activity	A magazine, newsletter or online publication
Part Of Official Scheme?	No
Geographic Reach	Local
Primary Audience	Other audiences
Results and Impact	Wrote an article about the audio dataset and data challenge for the online newsletter of the Open Research Network, organised by the University of Salford Data Management team in the library. This was delivered to 81 recipients who signed up to the network because they are interested in open data practices. The article included links to the challenge web page and also the dataset which was hosted on figshare.
Year(s) Of Engagement Activity	2018
URL	https://doi.org/10.17866/rd.salford.7751546.v1


Description	Audio Day 6 July 2018
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	This Audio Day brought together researchers and collaborators engaged in audio-related research projects linked to the University of Surrey.
Year(s) Of Engagement Activity	2018
URL	https://cvssp.org/events/audio_day_2018/


Description	Centre for Vision, Speech and Signal Processing (CVSSP) 30th Anniversary, University of Surrey, Guildford, UK
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	The University of Surrey's centre for vision, speech and signal processing (CVSSP) showcased the research excellence during their 'Can machines think?' event in April 2019 to celebrate the 30th anniversary. This event included perspectives on the future of AI and Machine Perception from international leaders, with invited talks by CVSSP founder Professor Josef Kittler and high-profile CVSSP alumni. It is followed by a panel discussion and live demonstrations of the centre's latest research activities including demos developed during MSoS project. The event concluded by highlighting future plans and opportunities to collaborate with CVSSP.
Year(s) Of Engagement Activity	2019
URL	https://www.surrey.ac.uk/centre-vision-speech-signal-processing/about/30th-anniversary


Description	Cheltenham Science Festival June 2017
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Public/other audiences
Results and Impact	Mark Plumbley was joint by security researcher Justin Nurse (Oxford) for a panel session at the Cheltenham Science Festival examining whether intelligent virtual assistants (like Siri, Alexa or Contana) leave us vulnerable to oversharing private information. The panel took place on 8th June 2017 and was chaired by BBC broadcaster Rory Cellan-Jones. Elements of the discussion were picked up widely in the media and in blogs. Columns appeared in The Times ('Be careful what you tell Alexa, says cyber expert', 9th June 2017) ) and The Sun ('Alexa, I need the police, I've been hacked!', 8th June 2017). There was also an interview segment on BBC Tech Tent with associated item on BBC News technology section website ('Tech Tent: fake news, algorithms and listening gadgets')
Year(s) Of Engagement Activity	2017
URL	https://cvssp.org/projects/making_sense_of_sounds/site/posts/Cheltenham_science_festival


Description	DCASE 2016 Workshop on Detection and Classification of Acoustic Scenes and Events
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	One-day workshop for researchers working on computational analysis of sound events and scene analysis to present and discuss their results. Resulted in increased interest in the topic area, and a new workshop is planned for 2017
Year(s) Of Engagement Activity	2016
URL	http://www.cs.tut.fi/sgn/arg/dcase2016/


Description	Exhibition
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	Yes
Geographic Reach	International
Primary Audience	Public/other audiences
Results and Impact	Exhibit at Manchester's Museum of Science and Industry and on-line mass participation experiment
Year(s) Of Engagement Activity	2006


Description	International Conference: Human-Technology Relations: Postphenomenology and Philosophy of Technology
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Other audiences
Results and Impact	The conference presentation introduced the project to a major strand of current philosophy, the more empirically oriented post-phenomenology school of thought, specifically in the philosophy of science. In the presentation 'Sonic Mnemonic' and the following question session, epistemological issues around remembering sounds were discussed and it was attempted to fathom the impact of the technology developed in the project and of similar machine learning advances.
Year(s) Of Engagement Activity	2018
URL	https://www.utwente.nl/.uc/f38ba11390102056d1700c7424201106d685c0ddedb1a00/%23%20PHTR%202018%20PROGR...


Description	Invited talk at Connected Places Catapult, London
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Regional
Primary Audience	Public/other audiences
Results and Impact	Invited talk on "AI for Sound: A Future Technology for Smarter Living and Travelling" at Connected Places Catapult, London, 20 June 2019 [Third Thursday]
Year(s) Of Engagement Activity	2019
URL	https://cp.catapult.org.uk/event/third-thursday-ai-for-sound-a-future-technology-for-smarter-living-...


Description	Invited talk at Sound Sensing in Smart Cities 2, London, Oct 2019
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Industry/Business
Results and Impact	Invited talk on "AI for Sound: A Future Technology for Smart Cities" at Sound Sensing in Smart Cities 2, 16 Oct 2019, Connected Places Catapult, London
Year(s) Of Engagement Activity	2019
URL	https://www.ioa.org.uk/civicrm/event/info%3Fid%3D438%26reset%3D1


Description	Invited talk to the Association of Noise Consultants, Aug 2020
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Professional Practitioners
Results and Impact	Invited talk on "Artificial Intelligence for Sound" to the Association of Noise Consultants, 27 Aug 2020 (Video meeting).
Year(s) Of Engagement Activity	2020
URL	https://www.linkedin.com/feed/update/urn:li:activity:6697179064635150337/


Description	New Scientist Live
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Regional
Primary Audience	Public/other audiences
Results and Impact	Presentation at New Scientist Live to audience of 200-300
Year(s) Of Engagement Activity	2017


Description	Portsmouth Physical Society
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Local
Primary Audience	Other audiences
Results and Impact	Christmas lecture for Portsmouth Physical Society
Year(s) Of Engagement Activity	2017


Description	Presentation at Goosfest
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Local
Primary Audience	Public/other audiences
Results and Impact	Presentation at Goosfest
Year(s) Of Engagement Activity	2017


Description	Spotlight talk at The Turing Presents: AI UK
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Industry/Business
Results and Impact	Presenting a Spotlight Talk on "AI for Sound" at the online The Turing Presents: AI UK, 23-24 March 2021, A showcase featuring UK academic work in AI and machine learning.
Year(s) Of Engagement Activity	2021
URL	https://web.archive.org/web/20210601153918/https://www.turing.ac.uk/ai-uk


Description	The 'Making Sense of Sounds' machine learning challenge
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Other audiences
Results and Impact	We organised an international machine learning challenge in the field of sound event classification from 08/08/2018 to 05/11/2018. The task consisted of replicating human categorisation of sounds into five high-level semantic categories with machine classification systems. Twenty-two systems from 11 teams were submitted, originating both from academia and industry and from a variety of countries (e.g., USA, India, France, Greece). The winning system achieved an average accuracy of 93%. The results were published in detail on the project's website and in summarised form in a short talk and a poster at the Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE2018 Workshop). The data set was downloaded 270 times by now.
Year(s) Of Engagement Activity	2018
URL	https://cvssp.org/projects/making_sense_of_sounds/site/challenge/


Description	• DCASE 2018 Workshop on Detection and Classification of Audio Scenes and Events, Woking, Surrey, UK
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	The workshop aimed to provide a venue for researchers working on computational analysis of sound events and scene analysis to present and discuss their results. DCASE 2018 Workshop is the third workshop on Detection and Classification of Acoustic Scenes and Events, being organized for the third time in conjunction with the DCASE challenge. We aim to bring together researchers from many different universities and companies with interest in the topic, and provide the opportunity for scientific exchange of ideas and opinions.
Year(s) Of Engagement Activity	2018
URL	http://dcase.community/workshop2018/index

Abstract

Planned Impact

Organisations

People

ORCID iD

Publications