Acoustic Signal Processing and Scene Analysis for Socially Assistive Robots

Lead Research Organisation: Imperial College London
Department Name: Electrical and Electronic Engineering

Abstract

The interaction between users and a robot often takes place in busy environments in the presence of competing speakers and background noise sources such as televisions. The signals received at the microphones of the robot are hence a mixture of the signals from multiple sound sources, ambient noise, and reverberation due to reflections of sound waves. Thus, in order to focus on stimuli of interest, the robot has to learn and adapt to the acoustic environment.

The aim of this research is to provide robots and machines with the ability to understand and adapt to the surrounding acoustic environment. Acoustic scene analysis combines salient features from the observed audio signals in order to create situational awareness of the environment; Sound sources are detected, localised and identified, whilst acoustic properties of the room itself can be characterised. Using the information acquired by analysing the acoustic scene, a three-dimensional map of the environment is created, and can be used to identify sounds or recognise the intent of speech signals. Moreover, by moving within the environment, the robot can explore and learn about the acoustic properties of its surrounding.

However, many of the tasks required for analysis of the acoustic scene are jointly dependent. For example, localising the sources of sounds buried in noise and reverberation is a challenging problem. Sound source localisation can be improved by enhancing the signals of desired sources, such as human speakers, whilst suppressing interfering sources, such as a television. However, for source enhancement, desired and interfering sources must be spatially distinguished, hence requiring knowledge of the source directions.

The novel objective of this research is therefore to identify and exploit constructively the joint dependencies between the tasks required for acoustic scene analysis. To achieve this objective, the project will take advantage of the motion of the robot in order to look at uncertain events from different perspectives. Techniques will be developed to constructively exploit motion of the robot's arms by fusing microphones attached to the robot's limbs with microphone arrays installed in the robot head. Furthermore, approaches will be investigated that allow multiple robots to share their experience and knowledge about the acoustic environment.

The research will be conducted at Imperial College London, within the Department of Electrical and Electronic Engineering with academic advice from national, European, and international project partners at the University of Edinburgh, UK; International Audio Laboratories Erlangen, Germany; and Bar-Ilan University, Israel.

Planned Impact

Equipping machines with an understanding of the acoustic environment allows a robot to engage in verbal interactions with humans. The robot can also adapt to the environment, e.g., by stepping closer or increasing the volume. Moreover, abnormal sounds, such as the noise due to a household accident, can be detected and appropriate actions can be taken, e.g., calling an ambulance.

The potential impact of this project therefore spans across the sectors of healthcare, industry, academia, and the government.

From the perspective of healthcare, socially assistive robots that are capable of providing physical aid and non-physical interaction could facilitate low-cost assistance for the 1.6 million people in the UK who provide over 50 hours a week of unpaid care to family and friends, as well as patients who cannot rely on the support of relatives.

For the industry, the research results have the potential to impact a wide range of applications that rely on processing of sounds in realistic environments, including search-and-rescue technology, hearing aids, home entertainment systems, and automatic speech recognition systems.

For academia, the trans-disciplinary nature of the project has the potential to help bridge the gap between the fields of robotics, digital signal processing, and acoustics.

Finally, from a societal perspective, research targeting intuitive human-robot interaction promotes public acceptance of robots in everyday life, thereby supporting the government's initiative to become one of the world leading nations in the field of Robotics and Autonomous Systems, a market with estimated global economic impact of USD 1.7-4.5 trillion annually by 2025.
 
Description The work funded through this award focuses on acoustic scene mapping for robot audition, and is highly transdisciplinary, spanning across acoustic signal processing, machine learning and robotics. The following contributions were achieved:
1) We organised an IEEE-SPS data challenge and published an open-access data corpus and software framework for the evaluation of acoustic scene mapping algorithms in order to foster reproducible and comparable research in the field. (Relevant DOIs: 10.5281/zenodo.3630471, 10.1109/IWAENC.2018.8521288, doi.org/10.1109/SAM.2018.8448644)
2) We pioneered Acoustic Simultaneous Localization and Mapping (SLAM) for robot audition. The proposed approach equips robots and autonomous machines with the spatial awareness required to understand and interact with sound sources in everyday environments. (Relevant DOIs: 10.1109/TASLP.2018.2828321, 10.1109/TSP.2017.2775590)
3) We developed a novel approach for acoustic scene mapping in smart environments that allows the robot to collaborate and share information with other devices in acoustic sensor networks, such as mobile phones, home assistants, or hearing aids. (Relevant DOI: 10.1109/LSP.2018.2849579)
4) We proposed new algorithms for the separation and enhancement of audio signals that are adversely affected by interference, noise and reverberation. The proposed algorithms exploit the correlations in the harmonic structure of speech signals. By tracking the voice pitch of multiple, the onsets and endpoints of multiple, overlapping speech signals can be detected in order to answer the question "Who speaks when?". Furthermore, we showed that the lack of correlation between desired source signals and any interfering noise signals (such as background babble, or fan noise) can be exploited constructively for the enhancement of the speech signals. (Relevant DOIs: 10.1109/WASPAA.2019.8937235, 10.1109/ICASSP.2019.8682924, 10.1109/WASPAA.2019.8937185)
5) We proposed a novel approach to acoustic environment classification based on deep neural networks (Relevant DOI: 10.23919/EUSIPCO.2017.8081293, 10.1109/ICASSP.2017.7952257)
Exploitation Route The research funded by this award provided the underpinning theoretical framework for acoustic scene mapping. Current and future research build upon this framework to ensure long-term, scalable machine autonomy.

Academic research will build upon the framework developed for this funding in order to develop novel, data-driven approaches to machine listening. Non-academic routes will investigate applications for commercial and societal impact, including autonomous vehicles (transport); unmanned vehicles (aerospace & defence); immersive environments (retail); assistive robotics and hearing aids (healthcare); as well as pipeline fault detection and underwater robotics (environment).
Sectors Aerospace, Defence and Marine,Digital/Communication/Information Technologies (including Software),Environment,Healthcare,Retail,Transport

 
Description Spoken language is one of the primary channels of human interaction and is therefore a fundamental prerequisite for intuitive human-machine interaction. Robot audition aims to approach and excel the human auditory system in order to make sense of and interact with acoustic events in the environment. Acoustic awareness is paramount for autonomous agents to focus attention on salient events outside of the field-of-view of the cameras. Applications reach from disaster scenarios and crowded public spaces, where sources of interest may be visually occluded, to home environments, where people may call for attention from different rooms. In contrast to machine vision, robot audition has only recently begun to attract attention in the research community. The state-of-the-art is focused on controlled scenarios involving static microphone arrays as well as static and continuously active sound sources. However, in practice, robots and human talkers are highly dynamic. As a consequence, the microphone signals are affected by severe changes in the acoustic channel, interference from multiple sources, and periods of speech inactivity. The hypothesis underpinning this project is that spatial, spectral, and temporal diversity of sounds exhibited in the microphone signals can be exploited constructively within a Bayesian framework to probabilistically identify and track the positions, voices, and activity periods of multiple, interfering sound sources. This was a highly interdisciplinary project, spanning the fields of statistics, acoustics, machine learning, and robotics. Our work has led to the following major contributions: 1) We pioneered Acoustic Simultaneous Localization And Mapping (SLAM), which provides robots with situational awareness by exploiting the spatial diversity affecting moving microphone arrays. Acoustic SLAM enables robots to self-localize their position and orientation whilst simultaneously mapping the positional trajectories of nearby sound sources. 2) We developed a probabilistic framework for Acoustic Scene Mapping in Smart Environments that exploits the spectral diversity of speech and the spatial diversity of multiple, networked microphone arrays for distributed sensor fusion. In practice, the source-sensor range is difficult to obtain using compact arrays that are typically used for robotics. However, the range is required to distinguish between nearby, reliable sensors and distant microphone array. We proposed to estimate a measure of reliability that weights the energy due to coherent sources against the energy due to incoherent, diffuse noise in each frequency band of the audible spectrum. This measure decreases with increasing range, and therefore implicitly quantifies the distance between each source and sensor. The work resulted from a collaboration with Bar-Ilan University, and Audiolabs Erlangen, Germany. 3) We led the IEEE Challenge on Acoustic Source Localization And Tracking (LOCATA), which provides an open-access data corpus and software framework for the evaluation of acoustic scene mapping algorithms. As the annotation of audio data in real-world acoustic environments is extremely time-consuming, the performance of algorithms in the acoustic signal processing community is typically evaluated using simulated data. To foster reproducible and comparable research, the LOCATA challenge provides a dataset of audio recordings of a range of static and dynamic scenarios, completely annotated with ground-truth positions, orientations, and voice-activity labels for all sources and sensors. The open-source software framework provides comprehensive measures for performance evaluation. The challenge organised in collaboration with Friedrich Alexander University, Germany and Humboldt University Berlin, Germany. Engagement with the National Oceanography Centre (NOC) highlighted that the findings of this project have potential transformative impact on ecological monitoring and oceanography. Therefore, a recent collaboration with NOC is taking forward the outputs of this fellowship with the aim to develop novel approaches to machine learning for the analysis of ambient marine noise. The aim of our ongoing research is to radically improve ocean observation and visualisation capabilities, both for oceanographic research and for various marine sector applications of national and strategic importance. Identifying and distinguishing among events in the marine environment is an essential task in developing better understanding of climate change, and animal and human behaviour across 71% of the planet. Our ongoing research collaboration with NOC aims to combine expertise in densely distributed big-data acquisition and machine learning and AI techniques to characterise and automatically identify patterns in this data to aid human understanding of complex, marine environments. Furthermore, engagement with academic and industry partners within the UKRI Trustworthy Autonomous Systems (TAS) Hub highlighted that our novel contributions enabling robot audition directly benefit the agriculture and food sector as well as health technologies. A particular challenge that impacts on the user adoption of robotic and autonomous systems relates to trust towards robots and, more broadly, within human-robot teams. Ongoing research investigates the moment-to-moment evolution of trust and the various factors that impact on trust. This ongoing collaboration between the University of Southampton, University of Nottingham and King's College London is a highly interdisciplinary project which addresses different aspects related to trust within and towards human-robot teams in surgery and cleaning.
First Year Of Impact 2021
Sector Aerospace, Defence and Marine,Agriculture, Food and Drink,Healthcare
Impact Types Societal,Economic

 
Description UKRI Centre for Doctoral Training in Machine Intelligence for Nano-electronic Devices and Systems
Amount £5,820,891 (GBP)
Funding ID EP/S024298/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 03/2019 
End 09/2027
 
Description UKRI Trustworthy Autonomous Systems Hub
Amount £11,896,883 (GBP)
Funding ID EP/V00784X/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 09/2020 
End 08/2024
 
Title Software for performance evaluation and benchmarking of algorithms for acoustic source localization and tracking for the IEEE-AASP Challenge on Acoustic Source Localization and Tracking (LOCATA) 
Description A MATLAB framework was developed that allows researchers to objectively evaluate and benchmark their algorithms against state-of-the-art approaches in acoustic source localization and tracking. The framework was released as open-source code with the development database for the IEEE-AASP Challenge on acoustic source localization and tracking (LOCATA)/ 
Type Of Material Technology assay or reagent 
Year Produced 2020 
Provided To Others? Yes  
Impact As of 14 March 2018, i.e., 30 days after release of the data, 61 researchers have registered for download of the data. Registered users indicated fields of expertise in audio signal processing, acoustics, as well as robotics. A measurable impact of the LOCATA challenge, its data corpus and benchmarking framework will be available after completion and evaluation of the Challenge in September 2018. 
URL https://github.com/cevers/sap_locata_eval
 
Title Software for using the LOCATA Challenge dataset 
Description Open-access Matlab framework for reading data in the LOCATA Challenge dataset. This framework allows users to apply their own algorithms to the dataset and write the results to file. 
Type Of Material Improvements to research infrastructure 
Year Produced 2020 
Provided To Others? Yes  
Impact As of 14 March 2018, i.e., 30 days after release of the data, 61 researchers have registered for download of the data. Registered users indicated fields of expertise in audio signal processing, acoustics, as well as robotics. A measurable impact of the LOCATA challenge, its data corpus and benchmarking framework will be available after completion and evaluation of the Challenge in September 2018. 
URL https://github.com/cevers/sap_locata_io
 
Title Open-access data corpus for the IEEE-AASP Challenge on Acoustic Source Localization and Tracking (LOCATA) 
Description The challenge of sound source localization in acoustically complex environments has attracted widespread attention in the research community in recent years. Source localization approaches in the literature range from single-sensor to multi-sensor and distributed arrays, based on features including, for example, Time Delays of Arrival, Direction of Arrival, or even audio spectrograms. Nevertheless, despite the significant impact of sound source localization approaches, a comprehensive, objective benchmarking campaign of state-of-the-art algorithms is to date unavailable. The IEEE AASP challenge on acoustic source LOCalization And TrAcking (LOCATA) aims at providing researchers in source localization with a framework to objectively benchmark results against competing algorithms using a common, publically released data corpus that encompasses a range of realistic scenarios in an enclosed acoustic environment. The challenge data was released in February 2018 and will complete in September 2018. Results will be disseminated at a dedicated satellite workshop, held during the International Workshop on Acoustic Signal Enhancement (IWAENC), Tokyo, Japan, in 17-20 Sept 2018. A detailed description of the corpus can be found in the documentation, available for download at www.locata-challenge.org. A summary is submitted as a conference paper (see Loellmann et al, "The LOCATA Challenge Data Corpus for Acoustic Localization and Tracking"). 
Type Of Material Database/Collection of data 
Year Produced 2018 
Provided To Others? Yes  
Impact The data was released on 16 February 2018. As of 14 March 2018 - less than a month after its release - 61 researchers registered for download of the dataset, indicating rapid adoption of the corpus by the community. The LOCATA data corpus is the first corpus of its kind, providing researchers and practitioners with acoustic recordings and highly accurate ground truth positional information of the acoustic sensors and sound sources within a realistic acoustic environment. It is therefore envisaged that the corpus will be utilized by practitioners in the long-term for objective evaluation and benchmarking within various areas including acoustic source localization, tracking, source separation and diarization, Simultaneous Localization and Mapping as well as motion and path planning. 
URL http://doi.org/10.5281/zenodo.3630471
 
Description Analysis of ambient marine noise using machine learning 
Organisation National Oceanography Centre
Country United Kingdom 
Sector Academic/University 
PI Contribution This partnership builds on the outcomes of my fellowship award by developing machine learning and AI techniques for the processing and analysis of complex, marine environments.
Collaborator Contribution NOC provide world-renowned expertise in densely distributed big-data acquisition.
Impact Identifying and distinguishing among events in the marine environment is an essential task in developing better understanding of climate change, and animal and human behaviour across 71% of the planet. Sources of ambient noise in the marine environment can be classified into natural (sediment flows, volcanic geo-hazards, etc.) and anthropogenic (ocean bottom trawling, offshore drilling, etc.). The aim of this research is to radically improve ocean observation and visualization capabilities, both for oceanographic research and for various marine sector applications of national and strategic importance. This research collaboration between NOC and the University of Southampton aims to combine expertise in densely distributed big-data acquisition and machine learning and AI techniques to characterise and automatically identify patterns in this data to aid human understanding of the environment. The key challenges in this project stem from the volume of streaming data generated and the lack of substantial quantities of labelled signals. This is a highly multi-disciplinary collaboration that cuts across marine science, physics, as well as machine learning & AI.
Start Year 2021
 
Description Audio Tracking using Variational Expectation-Maximization (VEM) 
Organisation The National Institute for Research in Computer Science and Control (INRIA)
Country France 
Sector Public 
PI Contribution N/A
Collaborator Contribution N/A
Impact This collaboration has resulted in the development of a probabilistic framework for tracking multiple audio sources in complex, acoustic environments. The outcomes of this research led to the following journal publication: Y. Ban, X. Alameda-Pineda, C. Evers, R. Horaud, "Tracking Multiple Audio Sources With the von Mises Distribution and Variational EM", in Signal Processing Letters.
Start Year 2018
 
Description Bayesian track-before-detect with Bar-Ilan University, Israel 
Organisation Bar-Ilan University
Country Israel 
Sector Academic/University 
PI Contribution Lead researcher of the collaboration. Provision of expertise in Bayesian inference, acoustic source tracking, as well as in robotics for acoustic sensor arrays installed on autonomous, moving platforms.
Collaborator Contribution Provision of expertise in acoustic source localization. Provision of access to specialized equipment, including a unique audio laboratory with controlled acoustics.
Impact During the research, we proved that acoustic sources can be directly tracked from acoustic signals, without the need for a pre-processing stage for source localization and / or detection. The research is placed at the intersection between acoustics, signal processing, and robotics and is therefore highly multidisciplinary.
Start Year 2017
 
Description Distributed acoustic source localization and tracking 
Organisation Bar-Ilan University
Country Israel 
Sector Academic/University 
PI Contribution Lead researcher of the collaboration. Provision of expertise in Bayesian inference, as well as in robotics for acoustic sensor arrays installed on autonomous, moving platforms. Dr Christine Evers visited AudioLabs Erlangen in June 2017 for 3 weeks in order to work with the collaborators, Prof. Emanuel Habets (Friedrich-Alexander University) and Prof. Sharon Gannot (Bar-Ilan University).
Collaborator Contribution Provision of expertise in spatial audio (Friedrich-Alexander University) and distributed sensing (Bar-Ilan University). Provision of access to simulators and methods for estimation of acoustic features, including estimators of the direction-of-arrival of sound sources and coherent-to-diffuse ratio at an acoustic sensor.
Impact This collaboration has led to the development of a probabilistic framework that fuses acoustic information from multiple devices in smart environments. Underpinned by the Bayesian paradigm, the framework enables autonomous agents, such as robots, to distinguish distant - and hence unreliable devices - from devices that are nearby a sound sources and therefore encapsulate highly reliable information. The collaboration has led to the publication of the following journal paper: C. Evers, E. A. P. Habets, S. Gannot, and P. A. Naylor, "DoA Reliability for Distributed Acoustic Tracking", IEEE Signal Processing Letters, Mar. 2018. Disciplines: Signal Processing, Acoustics
Start Year 2017
 
Description Distributed acoustic source localization and tracking 
Organisation Friedrich-Alexander University Erlangen-Nuremberg
Country Germany 
Sector Academic/University 
PI Contribution Lead researcher of the collaboration. Provision of expertise in Bayesian inference, as well as in robotics for acoustic sensor arrays installed on autonomous, moving platforms. Dr Christine Evers visited AudioLabs Erlangen in June 2017 for 3 weeks in order to work with the collaborators, Prof. Emanuel Habets (Friedrich-Alexander University) and Prof. Sharon Gannot (Bar-Ilan University).
Collaborator Contribution Provision of expertise in spatial audio (Friedrich-Alexander University) and distributed sensing (Bar-Ilan University). Provision of access to simulators and methods for estimation of acoustic features, including estimators of the direction-of-arrival of sound sources and coherent-to-diffuse ratio at an acoustic sensor.
Impact This collaboration has led to the development of a probabilistic framework that fuses acoustic information from multiple devices in smart environments. Underpinned by the Bayesian paradigm, the framework enables autonomous agents, such as robots, to distinguish distant - and hence unreliable devices - from devices that are nearby a sound sources and therefore encapsulate highly reliable information. The collaboration has led to the publication of the following journal paper: C. Evers, E. A. P. Habets, S. Gannot, and P. A. Naylor, "DoA Reliability for Distributed Acoustic Tracking", IEEE Signal Processing Letters, Mar. 2018. Disciplines: Signal Processing, Acoustics
Start Year 2017
 
Description IEEE-AASP Challenge on acoustic source localization and tracking (LOCATA) 
Organisation Friedrich-Alexander University Erlangen-Nuremberg
Country Germany 
Sector Academic/University 
PI Contribution Research visit at Humboldt-University Berlin in January 2017 for the collaborative measurement campaign of the LOCATA data corpus. Provision of expertise in acoustic signal processing for processing of the data, and creation of the benchmark framework for challenge participants. Provision of code for benchmark approaches for acoustic source localization and tracking, used as baseline methods for evaluation of the results. Provision of specialized spherical microphone arrays, including pseudospherical bespoke robot head and spherical mh acoustics eigenmike, for recordings.
Collaborator Contribution Friedrich-Alexander University: Lead of project - Provision of specialized equipment, including hearing aid dummies, dummy head, linear microphone array, and audio interfaces, for recordings. Provision of expertise in audio signal processing for processing of audio data, and creation of code for challenge participants. Humboldt-University Berlin: Provision of specialized equipment (optical tracking system) and facillities for the measurement of ground-truth data of all sound-source and microphone positions, orientations, and velocities. Provision of expert to process the data acquired using the optical tracking system.
Impact This collaboration has led to the organisation of an IEEE data challenge and the release of an open-access dataset. Since its release, the dataset is widely used in the acoustic source localisation & tracking community. As of 13 March 2022, the dataset has been download over 10k times. The outputs of this research impact across the fields of acoustics, sensor array and multichannel signal processing, audio signal processing, machine learning and robotics. To describe the data corpus, a conference paper was submitted in March 2018: H. Löllmann, C. Evers, A. Schmidt, H. Mellmann, H. Barfuss, P. A. Naylor, and W. Kellermann, "The LOCATA Challenge Data Corpus for Acoustic Localization and Tracking", submitted to IEEE Sensor Array and Multichannel Signal Processing Workshop (SAM), Mar. 2018. To disseminate the challenge results, two special sessions at international conferences to be held in 2018 are accepted: 1) Special Session on "Localization for Audio Applications," to be held at the IEEE Sensor Array and Multichannel Signal Processing Workshop, Sheffield, UK; 2) Special Session for the dissemination of the LOCATA challenge results: "LOCATA Challenge Workshop," to be held at Intl. Workshop on Acoustic Signal Enhancement (IWAENC), Tokyo, Japan.
Start Year 2017
 
Description IEEE-AASP Challenge on acoustic source localization and tracking (LOCATA) 
Organisation Humboldt University of Berlin
Country Germany 
Sector Academic/University 
PI Contribution Research visit at Humboldt-University Berlin in January 2017 for the collaborative measurement campaign of the LOCATA data corpus. Provision of expertise in acoustic signal processing for processing of the data, and creation of the benchmark framework for challenge participants. Provision of code for benchmark approaches for acoustic source localization and tracking, used as baseline methods for evaluation of the results. Provision of specialized spherical microphone arrays, including pseudospherical bespoke robot head and spherical mh acoustics eigenmike, for recordings.
Collaborator Contribution Friedrich-Alexander University: Lead of project - Provision of specialized equipment, including hearing aid dummies, dummy head, linear microphone array, and audio interfaces, for recordings. Provision of expertise in audio signal processing for processing of audio data, and creation of code for challenge participants. Humboldt-University Berlin: Provision of specialized equipment (optical tracking system) and facillities for the measurement of ground-truth data of all sound-source and microphone positions, orientations, and velocities. Provision of expert to process the data acquired using the optical tracking system.
Impact This collaboration has led to the organisation of an IEEE data challenge and the release of an open-access dataset. Since its release, the dataset is widely used in the acoustic source localisation & tracking community. As of 13 March 2022, the dataset has been download over 10k times. The outputs of this research impact across the fields of acoustics, sensor array and multichannel signal processing, audio signal processing, machine learning and robotics. To describe the data corpus, a conference paper was submitted in March 2018: H. Löllmann, C. Evers, A. Schmidt, H. Mellmann, H. Barfuss, P. A. Naylor, and W. Kellermann, "The LOCATA Challenge Data Corpus for Acoustic Localization and Tracking", submitted to IEEE Sensor Array and Multichannel Signal Processing Workshop (SAM), Mar. 2018. To disseminate the challenge results, two special sessions at international conferences to be held in 2018 are accepted: 1) Special Session on "Localization for Audio Applications," to be held at the IEEE Sensor Array and Multichannel Signal Processing Workshop, Sheffield, UK; 2) Special Session for the dissemination of the LOCATA challenge results: "LOCATA Challenge Workshop," to be held at Intl. Workshop on Acoustic Signal Enhancement (IWAENC), Tokyo, Japan.
Start Year 2017
 
Title Acoustic Simultaneous Localization and Mapping (aSLAM) 
Description A novel approach that simultaneously estimates the position and orientation of a sensor installed on a moving platform, such as a robot, whilst jointly estimating the positions of nearby sound sources. In this work, we proposed a novel approach, named acoustic SLAM (aSLAM), to map the positions of sound sources passively, and simultaneously localize a moving observer in realistic acoustic environments. aSLAM is based on the theoretical foundations of Random Finite Sets (RFSs) in order to map multiple, intermittently active sources, such as human talkers, subject to erroneous, false and missing DoA estimates. Moreover, to avoid the active emission of intrusive sound stimuli, aSLAM passively infers the 3D Cartesian source positions from the 2D DoA estimates, by exploiting constructively the spatiotemporal diversity of the observer for probabilistic source triangulation. aSLAM is based on GEM-SLAM, detailed under Software & Technical Products "New/Improved Technique/Technology - Optimized Self-Localization for SLAM (2018)". The following journal article is currently under review: C. Evers and P. A. Naylor, "Acoustic SLAM," in review, IEEE Trans. Audio, Speech and Language Processing, submitted Oct. 2017. 
Type Of Technology New/Improved Technique/Technology 
Year Produced 2018 
Impact This is one of two papers that pioneer Acoustic Simultaneous Localization and Mapping (SLAM) for robot audition. The research is highly transdisciplinary, spanning across acoustic signal processing, machine learning and robotics. As the culmination of a well-cited conference paper (10.1109/ICASSP.2016.7471626), this article was published in the top journal for Acoustic Signal Processing (IF 3.531). The paper received over 2370 full-text views on IEEEXplore (Feb'20). The cutting-edge idea led to two keynote speeches (http://hscma2017.org/KeynoteSpeakers.asp, http://mi.eng.cam.ac.uk/UKSpeech2017/keynotes.html), several invited talks, international collaborations (Audiolabs Erlangen, Germany; Bar-Ilan University, Israel; INRIA, France), and the IEEE-SPS LOCATA Challenge (https://doi.org/10.5281/zenodo.3630471). 
 
Title DoA Reliability for Acoustic Source Tracking 
Description A novel approach for acoustic source tracking in distributed sensor networks that models the reliability of direction-of-arrival (DoA) estimates in order to distinguish accurate DoAs obtained at microphone arrays nearby a source from uncertain DoAs acquired at distant arrays. This paper proposes to incorporate the coherent-to-diffuse ratio as a DoA reliability for distributed acoustic tracking. The novel contribution is a tracking algorithm that probabilistically triangulates the Cartesian source positions by fusing the DoA estimates from all nodes within the network. The following journal article was submitted for publication and is currently under review: C. Evers, E. A. P. Habets, S. Gannot, and P. A. Naylor, "DoA Reliability for Distributed Acoustic Tracking", submitted to IEEE Signal Processing Letters, Mar. 2018. 
Type Of Technology New/Improved Technique/Technology 
Year Produced 2018 
Impact This paper is the outcome of an international collaboration with Audiolabs Erlangen, Germany, and Bar Ilan University, Israel, on data fusion in acoustic sensor networks. The paper provides the fundamental probabilistic framework for dealing with rotations and translations due to the spatial diversity of network nodes. The paper was published in a highly cited journal in Signal Processing (IF 3.268), targeted at fast publication of cutting edge research. Due to the page limitation, thorough derivations are provided in the peer-reviewed supplementary materials (https://ieeexplore.ieee.org/ielx7/97/8411353/8392398/supplementary_material.pdf?tp=&arnumber=8392398). The results of this paper were extended to multi-source tracking in 10.1109/LSP.2019.2908376. 
 
Title Multi-Source Tracking using Variational Expectation-Maximization 
Description This work proposes a novel variational expectation-maximisation approach for the tracking of multiple, moving human talkers in realistic acoustic environments. 
Type Of Technology New/Improved Technique/Technology 
Year Produced 2019 
Impact This paper is the result of an international collaboration with INRIA, France. The work in this paper is the culmination of a conference paper (10.1109/HSCMA.2017.7895564), which received the best-paper award at an international conference (https://team.inria.fr/perception/hscma17-award/). This paper was published in a highly cited journal in Signal Processing (IF 3.268), targeted at fast publication of cutting-edge ideas. Due to the limitation of the paper length, thorough and rigorous derivations, as well as open-access code and video demonstrations are provided in the supplementary materials (https://team.inria.fr/perception/research/audiotrack-vonm/). 
 
Title Optimized Self-Localization for SLAM 
Description A novel approach that self-localizes the position and orientation of a sensor, whilst jointly tracking the time-varying positions of nearby dynamic objects. Existing approach to Simultaneous Localization and Mapping (SLAM) are designed for visual and optical sensors. The aim of this work is to provide a theoretical framework for SLAM that is suitable for sensors that facilitate perception beyond vision but are not yet conventionally used for SLAM, such as acoustic microphone arrays [11], [12]. Since many non-conventional sensors are deployed in environments where the positions of the observer and the objects are highly time-varying, e.g., underwater, the objective of this work is the development of an approach that is specifically designed for dynamic environments. For robust SLAM performance in dynamic and uncertain environments, we proposed a novel approach, called GEneralized Motion (GEM)-SLAM, that fuses the knowledge inferred from feature mapping with reports of the observer motion. 
Type Of Technology New/Improved Technique/Technology 
Year Produced 2018 
Impact This is one of two papers that pioneer Acoustic Simultaneous Localization and Mapping (SLAM) for robot audition. This paper provides the underpinning, theoretical framework for the optimal fusion of sensor signals and inertial measurements in uncertain, dynamic scenes. The paper was published in the top journal in Signal Processing (journal IF 5.23). The paper received over 1370 full-text views on IEEEXplore (Feb 2020) and led to two keynote speeches (http://hscma2017.org/KeynoteSpeakers.asp, http://mi.eng.cam.ac.uk/UKSpeech2017/keynotes.html), several invited talks, international collaborations (Audiolabs Erlangen, Germany; Bar-Ilan University, Israel; INRIA, France), and the IEEE-SPS LOCATA Challenge (https://doi.org/10.5281/zenodo.3630471). 
 
Description Invited Talk at Friedrich Alexander Universitaet zu Erlangen, Germany 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact This invited talk was presented to professional practitioners in audio and acoustics. Its purpose was to broaden knowledge and to raise awareness of techniques for Bayesian learning for robot audition
Year(s) Of Engagement Activity 2017
 
Description Invited Talk at ISVR Southampton 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Postgraduate students
Results and Impact This invited talk was presented to professional practitioners in audio and acoustics. Its purpose was to broaden knowledge and to raise awareness of techniques for acoustic scene mapping for robot audition.
Year(s) Of Engagement Activity 2017
 
Description Invited Talk at Queen Mary University of London 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Postgraduate students
Results and Impact This invited talk was presented to postgraduate students and professional practitioners in audio and acoustics. Its purpose was to broaden knowledge and to raise awareness of techniques for audio-based localisation, tracking for human-machine interaction.
Year(s) Of Engagement Activity 2019
 
Description Invited Talk at Universitaet Oldenburg, Germany 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact This invited talk was presented to professional practitioners in audio and acoustics. Its purpose was to broaden knowledge and to raise awareness of techniques for Bayesian learning for robot audition.
Year(s) Of Engagement Activity 2017
 
Description Invited Talk at University of Sheffield 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Industry/Business
Results and Impact This invited talk was presented to professional practitioners in audio and acoustics. Its purpose was to broaden knowledge and to raise awareness of techniques for Machine listening for situational awareness. The talk has led to further involvement with the UK Acoustics Network (UKAN) including a follow-up seminar hosted by UKAN.
Year(s) Of Engagement Activity 2019
 
Description Organization of satellite working during the 2018 International Workshop on Acoustic Signal Enhancement for dissemination of the outcomes of the IEEE-AASP Challenge on acoustic source Localization and Tracking (LOCATA), held 17-20 September 2018, Tokyo, Japan 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact A workshop for the announcement and dissemination of the IEEE-AASP LOCATA Challenge on acoustic source localization and tracking was organized as a satellite workshop to the "International Workshop on Acoustic Signal Enhancement" (IWAENC) in Tokyo in September 2018. Approximately 80 participants from research and industry attended the workshop, which sparked questions about future directions and open research questions in the field of acoustic source localization and tracking. Challenge participants participants reported that the data corpus published for the LOCATA challenge provided them with a deeper understanding of the challenges in practical scenarios. As a consequence, challenge participants as well as workshop participants reported a change in view about the importance of the field of acoustic source localziation and tracking and the challenges and future directions involved.
Year(s) Of Engagement Activity 2018
URL http://www.iwaenc2018.org
 
Description Organization of special session during the 2018 IEEE Sensor Array and Multichannel Signal Processing Workshop (SAM) on "Localization for audio applications", held 8-11 July 2018, Sheffield, UK 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Smart and autonomous systems, such as home assistants, mobile phones, and assistive robots, rely on robust multichannel audio processing in order to interact intuitively with humans in the acoustic environment. In order to detect, engage with and focus on human talkers, sound source localization is crucial for human-machine interaction. However, in realistic acoustic scenarios, localization algorithms are subject to the adverse effects of reverberation, noise, and interference. Moreover, acoustic sensor arrays are often installed on mobile platforms, for example for humanoid robots or hearing aids. The motion of the sensor array as well as the human sources in the environment therefore lead to highly dynamic scenarios. The aim of the proposed Special Session on 'Localization for Audio Applications' is to present recent approaches and methods for sound source localization that address the practical challenges encountered in everyday environments. Presenters and listeners in this session will benefit from a broad selection of state-of-the-art approaches and timely applications.
Year(s) Of Engagement Activity 2018
URL http://www.sam2018.org
 
Description Seminar hosted by UK Acoustics Network 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact This seminar was hosted by the UK Acoustics Network, which brings together professional practitioners, commercial companies and academic researchers who are involved in the area of acoustics. The seminar was well attended and has provoked lively discussions between the participants. The seminar led to further discussions about future grant proposals and collaborations.
Year(s) Of Engagement Activity 2020