Making Sense of Sounds

Lead Research Organisation: University of Surrey
Department Name: Vision Speech and Signal Proc CVSSP

Abstract

In this project we will investigate how to make sense from sound data, focussing on how to convert these recordings into understandable and actionable information: specifically how to allow people to search, browse and interact with sounds.

Increasing quantities of sound data are now being gathered in archives such as sound and audiovisual archives, through sound sensors such as city soundscape monitoring and as soundtracks on user-generated content. For example, the British Library (BL) Sound Archive has over a million discs and thousands of tapes; the BBC has some 1 million hours of digitized content; smart cities such as Santander (Spain) and Assen (Netherlands) are beginning to wire themselves up with a large number of distributed sensors; and 100 hours of video (with sound) are uploaded you YouTube every minute.

However, the ability to understand and interact with all this sound data is hampered by a lack of tools allowing people to "make sense of sounds" based on the audio content. For example, in a sound map, users may be able to search for sound clips by geographical location, but not by "similar sounds". In broadcast archives, users must typically know which programme to look for, and listen through to find the section they need. Manually-entered textual metadata may allow text-based searching, but these typically only refer to the entire clip or programme, can often be ambiguous, and are hard to scale to large datasets. In addition, browsing sound data collections is a time-consuming process: without the help of e.g. key frame images available from video clips, each sound clip has to be "auditioned" (listened to) to find what is needed, and where the point of interest can be found. Radio programme producers currently have to train themselves to listen to audio clips at up to double speed to save time in the production process. Clearly better tools are needed.

To do this, we will investigate and develop new signal processing methods to analyse sound and audiovisual files, new interaction methods to search and browse through sets of sound files, and new methods to explore and understand the criteria searchers use when searching, selecting and interacting with sounds. The perceptual aspect will also investigate people's emotional response to sounds and soundscapes, assisting sound designers or producers to find audio samples with the effect they want to create, and informing the development of public policy on urban soundscapes and their impact on people.

There are a wide range of potential beneficiaries for the research and tools that will be produced in this project, including both professional users and the general public. Archivists who are digitizing content into sound and audiovisual archives will benefit from new ways to visualize and tag archive material. Radio or television programme makers will benefit from new ways to search through recorded programme material and databases of sound effects to reuse, and new tools to visualize and repurpose archive material once identified. Sound artists and musicians will benefit from new ways to find interesting sound objects, or collections of sounds, for them to use as part of compositions or installations. Educators will benefit from new ways to find material on particular topics (machines, wildlife) based on their sound properties rather than metadata. Urban planners and policy makers will benefit from new tools to understand the urban sound environment, and people living in those urban environments will benefit through improved city sound policies and better designed soundscapes, making the urban environment more pleasant. For the general public, many people are now building their own archives of recordings, in the form of videos with soundtracks, and may in future include photographs with associated sounds (audiophotographs). This research will help people make sense of the sounds that surround us, and the associations and memories that they bring.

Planned Impact

Potential beneficiaries of this project outside of the academic research community include anyone who could benefit from new ways to explore sound and audiovisual data, or could benefit from access to the sounds that would be enabled by the research. Examples from different sectors are given below.

Commercial private sector:
* Commercial companies designing audio equipment, through easier access to new audio research;
* Musicians, composers and sound artists, through ways to find and explore new sounds as part of their creative output;
* Computer games companies, through new ways to reuse sound datasets creatively for new game sounds;
* Audio archiving companies, through access to the latest algorithms and methods for annotating and exploring sound archives;
* Television and radio companies, through ability to use sound data exploration technologies in the creation, editing and re-use of audio and audiovisual programmes.
* Acoustic consultants, through access to new ways of mapping and understanding soundscapes, which will help drive new design possibilities for the built environment.
* Internet-of-things companies who supply smart cities with networked sensor systems, through access to novel acoustic algorithms for more sophisticated mapping.
Policy-makers and others in government and government agencies:
* Urban planning authorities, through new insights into the impact of sounds and how to visualize and understand these impacts;
* Research funders, through establishment of a network of researchers in sound data research, opening up new opportunities for valuable research, and new demonstrators showing the value of research.

Public sector, third sector and others:
* Museums and other organizations with sound archives, through new software methods to allow people to explore and use their archives;
* Smart cities, through better ways to make sense of acoustic data from urban microphone arrays;
* Science promotion organizations, in particular through outputs from the projects on how people perceive and navigate sounds.

Wider public:
* People interested in exploring audio recordings at home, school, college or university, either for educational or general interest purposes;
* People recording sounds on mobiles and other portable devices, including those capturing audio as soundtracks to videos;
* Teachers in schools, colleges or universities who want to use sound examples for teaching audio or music;
* People living in urban environments, through improved city sound policies and better designed soundscapes, making the urban environment more pleasant;
* Audiences of creative output involving audio and music, through availability of new creative outputs facilitated by creative access to new sounds.

Researchers employed on the project:
* Improved skills in research methodologies, which may be transferred into e.g. the commercial private sector on completion of the project.

Publications

10 25 50
publication icon
Annamaria Mesaros (2017) Dcase2016 Challenge Submissions Package in Zenodo

publication icon
Annamaria Mesaros (2017) Dcase2016 Challenge Submissions Package in Zenodo

publication icon
Benetos E (2017) Polyphonic Sound Event Tracking Using Linear Dynamical Systems in IEEE/ACM Transactions on Audio, Speech, and Language Processing

publication icon
Bones O (2017) Clang, chitter, crunch: Perceptual organisation of onomatopoeia in The Journal of the Acoustical Society of America

publication icon
Bones O (2016) Toward an evidence-based taxonomy of everyday sounds in The Journal of the Acoustical Society of America

 
Description New methods to recognize sound scenes and events, based on methods such as deep learning and neuroevolution.
New insights into human categorization of everyday sounds.
New "attention"-like methods for deep learning using weak labels.
New insights into the relative performance and computational cost of methods to recognize sound events.
Exploitation Route Sound recognition for a range of applications in areas including: security, environmental monitoring, assisted living, autonomous vehicles, urban living, etc.
Sectors Agriculture, Food and Drink,Creative Economy,Digital/Communication/Information Technologies (including Software),Environment,Healthcare,Transport

URL http://cvssp.org/projects/making_sense_of_sounds/site/
 
Description Audio-Visual Media Research Platform
Amount £1,577,223 (GBP)
Funding ID EP/P022529/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Academic/University
Country United Kingdom
Start 08/2017 
End 07/2022
 
Description UK Acoustics Network
Amount £561,807 (GBP)
Funding ID EP/R005001/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Academic/University
Country United Kingdom
Start 11/2017 
End 11/2020
 
Title Challenge 2018 dataset 
Description This a collection of 2000 audio files which have been robustly classified by human assessors into top-level and then event-level categories, e,g, Human -- Laughing, Effects -- Smash. The audio files were taken from Freesound data base, the ESC-50 dataset and the Cambridge-MT Multitrack Download Library. All files have an identical format: single-channel 44.1 kHz, 16-bit .wav files. All files are exactly 5 seconds long, but may feature periods of silence, i.e. a given extract may feature a sound that is shorter than the duration of the audio file. The dataset was split into two parts and made available as development and evaluation file groups for a machine learning classification challenge. The development dataset consists of 1500 audio files divided into the five categories, each containing 300 files. The evaluation dataset consists of 500 audio files, 100 files per category. Usage details are provided with the files at: https://doi.org/10.17866/rd.salford.6901475.v4 
Type Of Material Database/Collection of data 
Year Produced 2018 
Provided To Others? Yes  
Impact DCASE challenge DCASE paper ICASSP paper 
URL https://cvssp.org/projects/making_sense_of_sounds/site/challenge/
 
Description Article for University of Salford Open Research Network newsletter 
Form Of Engagement Activity A magazine, newsletter or online publication
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Other audiences
Results and Impact Wrote an article about the audio dataset and data challenge for the online newsletter of the Open Research Network, organised by the University of Salford Data Management team in the library. This was delivered to 81 recipients who signed up to the network because they are interested in open data practices. The article included links to the challenge web page and also the dataset which was hosted on figshare.
Year(s) Of Engagement Activity 2018
 
Description Cheltenham Science Festival June 2017 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Public/other audiences
Results and Impact Mark Plumbley was joint by security researcher Justin Nurse (Oxford) for a panel session at the Cheltenham Science Festival examining whether intelligent virtual assistants (like Siri, Alexa or Contana) leave us vulnerable to oversharing private information. The panel took place on 8th June 2017 and was chaired by BBC broadcaster Rory Cellan-Jones.
Elements of the discussion were picked up widely in the media and in blogs. Columns appeared in The Times ('Be careful what you tell Alexa, says cyber expert', 9th June 2017) ) and The Sun ('Alexa, I need the police, I've been hacked!', 8th June 2017). There was also an interview segment on BBC Tech Tent with associated item on BBC News technology section website ('Tech Tent: fake news, algorithms and listening gadgets')
Year(s) Of Engagement Activity 2017
URL https://cvssp.org/projects/making_sense_of_sounds/site/posts/Cheltenham_science_festival
 
Description DCASE 2016 Workshop on Detection and Classification of Acoustic Scenes and Events 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact One-day workshop for researchers working on computational analysis of sound events and scene analysis to present and discuss their results. Resulted in increased interest in the topic area, and a new workshop is planned for 2017
Year(s) Of Engagement Activity 2016
URL http://www.cs.tut.fi/sgn/arg/dcase2016/
 
Description Exhibition 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? Yes
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact Exhibit at Manchester's Museum of Science and Industry and on-line mass participation experiment
Year(s) Of Engagement Activity 2006
 
Description International Conference: Human-Technology Relations: Postphenomenology and Philosophy of Technology 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact The conference presentation introduced the project to a major strand of current philosophy, the more empirically oriented post-phenomenology school of thought, specifically in the philosophy of science. In the presentation 'Sonic Mnemonic' and the following question session, epistemological issues around remembering sounds were discussed and it was attempted to fathom the impact of the technology developed in the project and of similar machine learning advances.
Year(s) Of Engagement Activity 2018
URL https://www.utwente.nl/.uc/f38ba11390102056d1700c7424201106d685c0ddedb1a00/%23%20PHTR%202018%20PROGR...
 
Description New Scientist Live 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Public/other audiences
Results and Impact Presentation at New Scientist Live to audience of 200-300
Year(s) Of Engagement Activity 2017
 
Description Portsmouth Physical Society 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Other audiences
Results and Impact Christmas lecture for Portsmouth Physical Society
Year(s) Of Engagement Activity 2017
 
Description Presentation at Goosfest 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Public/other audiences
Results and Impact Presentation at Goosfest
Year(s) Of Engagement Activity 2017
 
Description The 'Making Sense of Sounds' machine learning challenge 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact We organised an international machine learning challenge in the field of sound event classification from 08/08/2018 to 05/11/2018. The task consisted of replicating human categorisation of sounds into five high-level semantic categories with machine classification systems. Twenty-two systems from 11 teams were submitted, originating both from academia and industry and from a variety of countries (e.g., USA, India, France, Greece). The winning system achieved an average accuracy of 93%. The results were published in detail on the project's website and in summarised form in a short talk and a poster at the Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE2018 Workshop). The data set was downloaded 270 times by now.
Year(s) Of Engagement Activity 2018
URL https://cvssp.org/projects/making_sense_of_sounds/site/challenge/