Machine Listening using Sparse Representations
Lead Research Organisation:
Queen Mary University of London
Department Name: Sch of Electronic Eng & Computer Science
Abstract
My aim for this Fellowship is to undertake a concerted programme of research in machine listening, the automatic analysis and understanding of sounds from the world around us. Through this research, and in collaboration with other international researchers, I aim to establish machine listening as a key enabling technology to improve our ability to interact with the world, leading to advances in many areas such as health, security and the creative industries.Human listeners have many capabilities a machine listening system should ideally have: to recognize a wide range of sounds; to segregate one sound source from a mixture of many sound sources; to judge complex attributes of sound such as rhythm and timbre (sound quality). Most human listeners take these abilities for granted, yet it has proved extremely difficult for conventional audio signal processing methods to tackle many of these tasks. Even currently successful tasks, such as automatic speech recognition, have typically led to very specialized techniques which cannot easily be applied to other domains. I propose to introduce new methods for machine listening of general audio scenes.As part of this work, I also will develop new interdisciplinary collaborations with both the machine vision and biological sensory research communities toinvestigate and develop general organizational principles for machine listening. One such principle that currently looks very promising is that of sparse representations. New theoretical advances and practical applications mean that sparse representations has recently emerged as a new and powerful analysis method, based on the principle that observations should be represented by only a few items chosen from a large number of possible items. This approach now has great potential for analysis and measurement of audio as well as other sensory signals. I also plan to use sparse representations to explore new biologically-inspired machine listening methods, and in turn to improve our understandingof biological hearing systems.Success in this research will open the way for new devices and systems able to process, identify and respond to a wide range of sounds, with diverse applications including: audio searching for the music and video industry; advances in hearing aids and cochlear implants; and incident detection for improved public safety on stations, roads and airports.
People |
ORCID iD |
Mark Plumbley (Principal Investigator) |
Publications
Ewert S
(2014)
Score-Informed Source Separation for Musical Audio Recordings: An overview
in IEEE Signal Processing Magazine
Stowell D.
(2013)
Segregating event streams and noise with a markov renewal process model
in Journal of Machine Learning Research
Stowell Dan
(2013)
Segregating Event Streams and Noise with a Markov Renewal Process Model
in JOURNAL OF MACHINE LEARNING RESEARCH
Plumbley M.D.
(2014)
Separating musical audio signals
in Acoustics Bulletin
Hedayioglu F
(2011)
Separating sources from sequentially acquired mixtures of heart signals
Ophir B.
(2011)
Sequential minimal eigenvalues - An approach to analysis dictionary learning
in European Signal Processing Conference
Figueira L.A.
(2013)
Software techniques for good practice in audio and music research
in 134th Audio Engineering Society Convention 2013
Description | I have introduced many new algorithms and theoretical advances in sparse representations. New algorithms include: a new "stagewise" Polytope Faces Pursuit algorithm; new algorithms for finding "block-sparse" representations, where groups of atoms turn on and off together; and new algorithms for finding representations with "analysis sparsity", a new approach which produces many zeros after the transform. New dictionary learning methods include: a "double-sparsity" algorithm exploiting the structure of speech-like sounds; new algorithms exploiting harmonic structure; and new algorithms exploiting nonnegativity. New theoretical analysis includes: new convergence results for dictionary learning algorithms, a new approach to dictionary learning based on subspace identification, and new preconditioning and structure-aware methods for audio signals that do not satisfy the typical "restricted Isometry property" (RIP) sparse representations assumption. New methods for analysis of sounds include: audio source separation, including convolutive (echoic) and underdetermined cases (with more sound sources that microphones); direction of arrival (DOA) estimation using compressed sensing; sound timbre classification applied to non-speech vocals; pitch tracking and multipitch analysis for audio object modelling; and a new "audio inpainting" approach to audio restoration. New sound sequence analysis methods, particularly applicable to music, include onset detection, beat and rhythm tracking, with prediction methods based on sequence matching, autoregressive modelling, and information theoretic analysis. Work on heart sound separation was featured widely in medical media and on BBC Arabic TV, and work on birdsong analysis has led to a recent NERC "citizen science" project proposal. In promoting UK sparse representations research: I organized international workshops at Queen Mary in Jan 2011 and June 2012 (forthcoming); on the international SPARS committee, I worked with Davies & Tanner (Edinburgh) to bring SPARS'11 to the UK; and I co-organized special sessions at ICASSP, the main international signal processing conference, in 2011 and 2012 (forthcoming). Towards a UK machine listening community: I organized a UK workshop (2010), gave a well-publicized inaugural lecture (2011) on "Making Sense of Sounds and Music" (over 140 attendees), and contributed to UK workshops on "Computational Audition" (London, 2010) and "Making Sense of Sounds" (Plymouth, 2012). The exhibit at the EPSRC "IMPACT" Exhibition (2010) was featured in the Telegraph and on the Guardian Science podcast. I am building a community of UK audio and music researchers around SoundSoftware.ac.uk, promoting the development and re-use of research software, including training events, talks, tutorials, and lab visits. Forthcoming work will address visualization of sounds, and produce a real-time demonstrator. Changes: Audiovisual machine listening proved very vision-heavy and is left to other researchers in this area (e.g. Chambers at Loughborough). Parallels with biological processing is de-emphasized in favour of theory and algorithms with more potential. |
Exploitation Route | Potential beneficiaries of new theory and algorithms for sparse representations will be other researchers in signal processing, looking to apply sparse representations to their own problems, such as image or video analysis. Potential beneficiaries of successful machine listening research include those in industry looking to apply machine listening to a wide range of applications in many areas of human interaction with the world. Examples include: * Music and video industries, through music classification, recommendation and searching systems; * Artists, through e.g. installations which respond to sounds; * Hearing aid researchers and manufacturers and end users with hearing problems, through improved hearing aids and cochlear implants; * Police and security agencies, emergency response providers and planners, social services and health agencies, through sound-based identification and analysis of incidents at stations, roads and airports; * Air, manufacturing or automotive industries, through machine condition monitoring in e.g. aircraft, plant or cars; * Computer games industry, through analysis and representation of realistic environmental sounds. |
Sectors | Communities and Social Services/Policy,Creative Economy,Digital/Communication/Information Technologies (including Software),Environment,Healthcare |
URL | http://www.eecs.qmul.ac.uk/~markp/ |
Description | The work in machine listening has been further developed for applications in home security, and further work by one of the researchers has led to release of a smartphone app for birdsong recognition. |
First Year Of Impact | 2014 |
Sector | Environment,Security and Diplomacy |
Impact Types | Cultural,Economic |
Description | FP7 Marie Curie Initial Training Network |
Amount | € 2,800,000 (EUR) |
Funding ID | 607290 |
Organisation | European Commission |
Sector | Public |
Country | European Union (EU) |
Start | 09/2014 |
End | 09/2018 |
Description | H2020 Marie Sklodowska-Curie Action (MSCA) Innovative Training Network |
Amount | € 3,800,000 (EUR) |
Funding ID | 642685 |
Organisation | European Commission |
Sector | Public |
Country | European Union (EU) |
Start | 01/2015 |
End | 12/2018 |
Description | H2020-ICT-2015 Audio Commons |
Amount | € 2,980,000 (EUR) |
Funding ID | 688382 |
Organisation | European Commission |
Sector | Public |
Country | European Union (EU) |
Start | 02/2016 |
End | 01/2019 |
Description | Making Sense of Sounds |
Amount | £1,275,401 (GBP) |
Funding ID | EP/N014111/1 |
Organisation | Engineering and Physical Sciences Research Council (EPSRC) |
Sector | Public |
Country | United Kingdom |
Start | 01/2016 |
End | 12/2018 |
Description | TSB Data Exploration |
Amount | £217,000 (GBP) |
Organisation | Innovate UK |
Sector | Public |
Country | United Kingdom |
Start | 08/2014 |
End | 05/2016 |
Description | TSB Enabling the Internet of Sensors |
Amount | £99,000 (GBP) |
Funding ID | 40818-289315 |
Organisation | Innovate UK |
Sector | Public |
Country | United Kingdom |
Start | 06/2014 |
End | 02/2015 |
Description | Listening in the Wild: Animal and machine hearing in multisource environments |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Other audiences |
Results and Impact | This workshop brought together researchers in engineering disciplines (machine listening, signal processing, computer science) and biological disciplines (bioacoustics, ecology, perception and cognition), to discuss complementary perspectives on audition. |
Year(s) Of Engagement Activity | 2013 |
URL | http://c4dm.eecs.qmul.ac.uk/events/litw2013/ |
Description | Machine Listening Workshop 2010 |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Other audiences |
Results and Impact | A one-day workshop to bring together researchers across the spectrum of machine listening towards the development of a coherent research community able to exploit our common interest in the analysis of audio. |
Year(s) Of Engagement Activity | 2010 |
URL | http://c4dm.eecs.qmul.ac.uk/mlw2010/ |