Machine Listening using Sparse Representations

Lead Research Organisation: Queen Mary University of London

Department Name: Sch of Electronic Eng & Computer Science

Abstract

My aim for this Fellowship is to undertake a concerted programme of research in machine listening, the automatic analysis and understanding of sounds from the world around us. Through this research, and in collaboration with other international researchers, I aim to establish machine listening as a key enabling technology to improve our ability to interact with the world, leading to advances in many areas such as health, security and the creative industries.Human listeners have many capabilities a machine listening system should ideally have: to recognize a wide range of sounds; to segregate one sound source from a mixture of many sound sources; to judge complex attributes of sound such as rhythm and timbre (sound quality). Most human listeners take these abilities for granted, yet it has proved extremely difficult for conventional audio signal processing methods to tackle many of these tasks. Even currently successful tasks, such as automatic speech recognition, have typically led to very specialized techniques which cannot easily be applied to other domains. I propose to introduce new methods for machine listening of general audio scenes.As part of this work, I also will develop new interdisciplinary collaborations with both the machine vision and biological sensory research communities toinvestigate and develop general organizational principles for machine listening. One such principle that currently looks very promising is that of sparse representations. New theoretical advances and practical applications mean that sparse representations has recently emerged as a new and powerful analysis method, based on the principle that observations should be represented by only a few items chosen from a large number of possible items. This approach now has great potential for analysis and measurement of audio as well as other sensory signals. I also plan to use sparse representations to explore new biologically-inspired machine listening methods, and in turn to improve our understandingof biological hearing systems.Success in this research will open the way for new devices and systems able to process, identify and respond to a wide range of sounds, with diverse applications including: audio searching for the music and video industry; advances in hearing aids and cochlear implants; and incident detection for improved public safety on stations, roads and airports.

Funded Value:

£1,461,379

Funded Period:

Jul 08 - Sep 14

Funder:

EPSRC

Project Status:

Closed

Project Category:

Fellowship

Project Reference:

EP/G007144/1

Principal Investigator:

Mark Plumbley

Research Subject:

Info. & commun. Technol. (100%)

Research Topic:

Artificial Intelligence (50%)

Digital Signal Processing (25%)

Vision & Senses - ICT appl. (25%)

Organisations

Queen Mary University of London (Lead Research Organisation)

People	ORCID iD
Mark Plumbley (Principal Investigator)

Publications

Author Name Title

Publication Date Published

|< < 4 5 6 7 8 9 10 11 > >|

10 25 50

Fritsch J (2013) Score informed audio source separation using constrained nonnegative matrix factorization and score synthesis

Ewert S (2014) Score-Informed Source Separation for Musical Audio Recordings: An overview in IEEE Signal Processing Magazine

Stowell D. (2013) Segregating event streams and noise with a markov renewal process model in Journal of Machine Learning Research

Stowell Dan (2013) Segregating Event Streams and Noise with a Markov Renewal Process Model in JOURNAL OF MACHINE LEARNING RESEARCH

Plumbley M.D. (2014) Separating musical audio signals in Acoustics Bulletin

Hedayioglu F (2011) Separating sources from sequentially acquired mixtures of heart signals

Ophir B. (2011) Sequential minimal eigenvalues - An approach to analysis dictionary learning in European Signal Processing Conference

Figueira L.A. (2013) Software techniques for good practice in audio and music research in 134th Audio Engineering Society Convention 2013

Cannam C (2012) Sound Software: Towards software reuse in audio and music research

Sturm B (2014) Sound, Music, and Motion - 10th International Symposium, CMMR 2013, Marseille, France, October 15-18, 2013. Revised Selected Papers

Key Findings
Impact Summary
Further Funding
Engagement Activities


Description	I have introduced many new algorithms and theoretical advances in sparse representations. New algorithms include: a new "stagewise" Polytope Faces Pursuit algorithm; new algorithms for finding "block-sparse" representations, where groups of atoms turn on and off together; and new algorithms for finding representations with "analysis sparsity", a new approach which produces many zeros after the transform. New dictionary learning methods include: a "double-sparsity" algorithm exploiting the structure of speech-like sounds; new algorithms exploiting harmonic structure; and new algorithms exploiting nonnegativity. New theoretical analysis includes: new convergence results for dictionary learning algorithms, a new approach to dictionary learning based on subspace identification, and new preconditioning and structure-aware methods for audio signals that do not satisfy the typical "restricted Isometry property" (RIP) sparse representations assumption. New methods for analysis of sounds include: audio source separation, including convolutive (echoic) and underdetermined cases (with more sound sources that microphones); direction of arrival (DOA) estimation using compressed sensing; sound timbre classification applied to non-speech vocals; pitch tracking and multipitch analysis for audio object modelling; and a new "audio inpainting" approach to audio restoration. New sound sequence analysis methods, particularly applicable to music, include onset detection, beat and rhythm tracking, with prediction methods based on sequence matching, autoregressive modelling, and information theoretic analysis. Work on heart sound separation was featured widely in medical media and on BBC Arabic TV, and work on birdsong analysis has led to a recent NERC "citizen science" project proposal. In promoting UK sparse representations research: I organized international workshops at Queen Mary in Jan 2011 and June 2012 (forthcoming); on the international SPARS committee, I worked with Davies & Tanner (Edinburgh) to bring SPARS'11 to the UK; and I co-organized special sessions at ICASSP, the main international signal processing conference, in 2011 and 2012 (forthcoming). Towards a UK machine listening community: I organized a UK workshop (2010), gave a well-publicized inaugural lecture (2011) on "Making Sense of Sounds and Music" (over 140 attendees), and contributed to UK workshops on "Computational Audition" (London, 2010) and "Making Sense of Sounds" (Plymouth, 2012). The exhibit at the EPSRC "IMPACT" Exhibition (2010) was featured in the Telegraph and on the Guardian Science podcast. I am building a community of UK audio and music researchers around SoundSoftware.ac.uk, promoting the development and re-use of research software, including training events, talks, tutorials, and lab visits. Forthcoming work will address visualization of sounds, and produce a real-time demonstrator. Changes: Audiovisual machine listening proved very vision-heavy and is left to other researchers in this area (e.g. Chambers at Loughborough). Parallels with biological processing is de-emphasized in favour of theory and algorithms with more potential.
Exploitation Route	Potential beneficiaries of new theory and algorithms for sparse representations will be other researchers in signal processing, looking to apply sparse representations to their own problems, such as image or video analysis. Potential beneficiaries of successful machine listening research include those in industry looking to apply machine listening to a wide range of applications in many areas of human interaction with the world. Examples include: * Music and video industries, through music classification, recommendation and searching systems; * Artists, through e.g. installations which respond to sounds; * Hearing aid researchers and manufacturers and end users with hearing problems, through improved hearing aids and cochlear implants; * Police and security agencies, emergency response providers and planners, social services and health agencies, through sound-based identification and analysis of incidents at stations, roads and airports; * Air, manufacturing or automotive industries, through machine condition monitoring in e.g. aircraft, plant or cars; * Computer games industry, through analysis and representation of realistic environmental sounds.
Sectors	Communities and Social Services/Policy,Creative Economy,Digital/Communication/Information Technologies (including Software),Environment,Healthcare
URL	http://www.eecs.qmul.ac.uk/~markp/


Description	The work in machine listening has been further developed for applications in home security, and further work by one of the researchers has led to release of a smartphone app for birdsong recognition.
First Year Of Impact	2014
Sector	Environment,Security and Diplomacy
Impact Types	Cultural,Economic


Description	FP7 Marie Curie Initial Training Network
Amount	€ 2,800,000 (EUR)
Funding ID	607290
Organisation	European Commission
Sector	Public
Country	European Union (EU)
Start	09/2014
End	09/2018


Description	H2020 Marie Sklodowska-Curie Action (MSCA) Innovative Training Network
Amount	€ 3,800,000 (EUR)
Funding ID	642685
Organisation	European Commission
Sector	Public
Country	European Union (EU)
Start	01/2015
End	12/2018


Description	H2020-ICT-2015 Audio Commons
Amount	€ 2,980,000 (EUR)
Funding ID	688382
Organisation	European Commission
Sector	Public
Country	European Union (EU)
Start	02/2016
End	01/2019


Description	Making Sense of Sounds
Amount	£1,275,401 (GBP)
Funding ID	EP/N014111/1
Organisation	Engineering and Physical Sciences Research Council (EPSRC)
Sector	Public
Country	United Kingdom
Start	01/2016
End	12/2018


Description	TSB Data Exploration
Amount	£217,000 (GBP)
Organisation	Innovate UK
Sector	Public
Country	United Kingdom
Start	08/2014
End	05/2016


Description	TSB Enabling the Internet of Sensors
Amount	£99,000 (GBP)
Funding ID	40818-289315
Organisation	Innovate UK
Sector	Public
Country	United Kingdom
Start	06/2014
End	02/2015


Description	Listening in the Wild: Animal and machine hearing in multisource environments
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Other audiences
Results and Impact	This workshop brought together researchers in engineering disciplines (machine listening, signal processing, computer science) and biological disciplines (bioacoustics, ecology, perception and cognition), to discuss complementary perspectives on audition.
Year(s) Of Engagement Activity	2013
URL	http://c4dm.eecs.qmul.ac.uk/events/litw2013/


Description	Machine Listening Workshop 2010
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Other audiences
Results and Impact	A one-day workshop to bring together researchers across the spectrum of machine listening towards the development of a coherent research community able to exploit our common interest in the analysis of audio.
Year(s) Of Engagement Activity	2010
URL	http://c4dm.eecs.qmul.ac.uk/mlw2010/

Abstract

Organisations

People

ORCID iD

Publications