Machine Listening using Sparse Representations

Lead Research Organisation: Queen Mary University of London
Department Name: Sch of Electronic Eng & Computer Science

Abstract

My aim for this Fellowship is to undertake a concerted programme of research in machine listening, the automatic analysis and understanding of sounds from the world around us. Through this research, and in collaboration with other international researchers, I aim to establish machine listening as a key enabling technology to improve our ability to interact with the world, leading to advances in many areas such as health, security and the creative industries.Human listeners have many capabilities a machine listening system should ideally have: to recognize a wide range of sounds; to segregate one sound source from a mixture of many sound sources; to judge complex attributes of sound such as rhythm and timbre (sound quality). Most human listeners take these abilities for granted, yet it has proved extremely difficult for conventional audio signal processing methods to tackle many of these tasks. Even currently successful tasks, such as automatic speech recognition, have typically led to very specialized techniques which cannot easily be applied to other domains. I propose to introduce new methods for machine listening of general audio scenes.As part of this work, I also will develop new interdisciplinary collaborations with both the machine vision and biological sensory research communities toinvestigate and develop general organizational principles for machine listening. One such principle that currently looks very promising is that of sparse representations. New theoretical advances and practical applications mean that sparse representations has recently emerged as a new and powerful analysis method, based on the principle that observations should be represented by only a few items chosen from a large number of possible items. This approach now has great potential for analysis and measurement of audio as well as other sensory signals. I also plan to use sparse representations to explore new biologically-inspired machine listening methods, and in turn to improve our understandingof biological hearing systems.Success in this research will open the way for new devices and systems able to process, identify and respond to a wide range of sounds, with diverse applications including: audio searching for the music and video industry; advances in hearing aids and cochlear implants; and incident detection for improved public safety on stations, roads and airports.
 
Description I have introduced many new algorithms and theoretical advances in sparse representations. New algorithms include: a new "stagewise" Polytope Faces Pursuit algorithm; new algorithms for finding "block-sparse" representations, where groups of atoms turn on and off together; and new algorithms for finding representations with "analysis sparsity", a new approach which produces many zeros after the transform. New dictionary learning methods include: a "double-sparsity" algorithm exploiting the structure of speech-like sounds; new algorithms exploiting harmonic structure; and new algorithms exploiting nonnegativity. New theoretical analysis includes: new convergence results for dictionary learning algorithms, a new approach to dictionary learning based on subspace identification, and new preconditioning and structure-aware methods for audio signals that do not satisfy the typical "restricted Isometry property" (RIP) sparse representations assumption. New methods for analysis of sounds include: audio source separation, including convolutive (echoic) and underdetermined cases (with more sound sources that microphones); direction of arrival (DOA) estimation using compressed sensing; sound timbre classification applied to non-speech vocals; pitch tracking and multipitch analysis for audio object modelling; and a new "audio inpainting" approach to audio restoration. New sound sequence analysis methods, particularly applicable to music, include onset detection, beat and rhythm tracking, with prediction methods based on sequence matching, autoregressive modelling, and information theoretic analysis. Work on heart sound separation was featured widely in medical media and on BBC Arabic TV, and work on birdsong analysis has led to a recent NERC "citizen science" project proposal. In promoting UK sparse representations research: I organized international workshops at Queen Mary in Jan 2011 and June 2012 (forthcoming); on the international SPARS committee, I worked with Davies & Tanner (Edinburgh) to bring SPARS'11 to the UK; and I co-organized special sessions at ICASSP, the main international signal processing conference, in 2011 and 2012 (forthcoming).
Towards a UK machine listening community: I organized a UK workshop (2010), gave a well-publicized inaugural lecture (2011) on "Making Sense of Sounds and Music" (over 140 attendees), and contributed to UK workshops on "Computational Audition" (London, 2010) and "Making Sense of Sounds" (Plymouth, 2012). The exhibit at the EPSRC "IMPACT" Exhibition (2010) was featured in the Telegraph and on the Guardian Science podcast. I am building a community of UK audio and music researchers around SoundSoftware.ac.uk, promoting the development and re-use of research software, including training events, talks, tutorials, and lab visits. Forthcoming work will address visualization of sounds, and produce a real-time demonstrator. Changes: Audiovisual machine listening proved very vision-heavy and is left to other researchers in this area (e.g. Chambers at Loughborough). Parallels with biological processing is de-emphasized in favour of theory and algorithms with more potential.
Exploitation Route Potential beneficiaries of new theory and algorithms for sparse representations will be other researchers in signal processing, looking to apply sparse representations to their own problems, such as image or video analysis.
Potential beneficiaries of successful machine listening research include those in industry looking to apply machine listening to a wide range of applications in many areas of human interaction with the world. Examples include:
* Music and video industries, through music classification, recommendation and searching systems;
* Artists, through e.g. installations which respond to sounds;
* Hearing aid researchers and manufacturers and end users with hearing problems, through improved hearing aids and cochlear implants;
* Police and security agencies, emergency response providers and planners, social services and health agencies, through sound-based identification and analysis of incidents at stations, roads and airports;
* Air, manufacturing or automotive industries, through machine condition monitoring in e.g. aircraft, plant or cars;
* Computer games industry, through analysis and representation of realistic environmental sounds.
Sectors Communities and Social Services/Policy,Creative Economy,Digital/Communication/Information Technologies (including Software),Environment,Healthcare

URL http://www.eecs.qmul.ac.uk/~markp/
 
Description The work in machine listening has been further developed for applications in home security, and further work by one of the researchers has led to release of a smartphone app for birdsong recognition.
First Year Of Impact 2014
Sector Environment,Security and Diplomacy
Impact Types Cultural,Economic

 
Description FP7 Marie Curie Initial Training Network
Amount € 2,800,000 (EUR)
Funding ID 607290 
Organisation European Commission 
Sector Public
Country European Union (EU)
Start 10/2014 
End 09/2018
 
Description H2020 Marie Sklodowska-Curie Action (MSCA) Innovative Training Network
Amount € 3,800,000 (EUR)
Funding ID 642685 
Organisation European Commission 
Sector Public
Country European Union (EU)
Start 01/2015 
End 12/2018
 
Description H2020-ICT-2015 Audio Commons
Amount € 2,980,000 (EUR)
Funding ID 688382 
Organisation European Commission 
Sector Public
Country European Union (EU)
Start 02/2016 
End 01/2019
 
Description Making Sense of Sounds
Amount £1,275,401 (GBP)
Funding ID EP/N014111/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 01/2016 
End 12/2018
 
Description TSB Data Exploration
Amount £217,000 (GBP)
Organisation Innovate UK 
Sector Public
Country United Kingdom
Start 09/2014 
End 05/2016
 
Description TSB Enabling the Internet of Sensors
Amount £99,000 (GBP)
Funding ID 40818-289315 
Organisation Innovate UK 
Sector Public
Country United Kingdom
Start 07/2014 
End 02/2015
 
Description Listening in the Wild: Animal and machine hearing in multisource environments 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact This workshop brought together researchers in engineering disciplines (machine listening, signal processing, computer science) and biological disciplines (bioacoustics, ecology, perception and cognition), to discuss complementary perspectives on audition.
Year(s) Of Engagement Activity 2013
URL http://c4dm.eecs.qmul.ac.uk/events/litw2013/
 
Description Machine Listening Workshop 2010 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Other audiences
Results and Impact A one-day workshop to bring together researchers across the spectrum of machine listening towards the development of a coherent research community able to exploit our common interest in the analysis of audio.
Year(s) Of Engagement Activity 2010
URL http://c4dm.eecs.qmul.ac.uk/mlw2010/