ActivATOR - Active AudiTiOn for Robots
Lead Research Organisation:
University of Southampton
Department Name: Sch of Electronics and Computer Sci
Abstract
Life in sound occurs in motion. As human listeners, audition - the ability to listen - is shaped by physical interactions between our bodies and the environment. We integrate motion with auditory perception in order to hear better (e.g., by approaching sound sources of interest), to identify objects (e.g., by touching objects and listening to the resulting sound), to detect faults (e.g., by moving objects to listen to anomalous creaks), and to offload thought (e.g., by tapping surfaces to recall musical pieces).
Therefore, the ability to make sense of and exploit sounds in motion is a fundamental prerequisite for embodied Artificial Intelligence (AI). This project will pioneer the underpinning, probabilistic framework for active robot audition that enables embodied agents to control the motion of their own bodies ('ego-motion') for auditory attention in realistic, acoustic environments (households, public spaces, and environments involving multiple, competing sound sources).
By integrating sound with motion, this project will enable machines to imagine, control and leverage the auditory consequences of physical interactions with the environment. By transforming the ways in which machines make sense of life in sound, the research outcomes will be pivotal for new, emerging markets that enable robots to augment, rather than rival, humans in order to surpass the limitations of the human body (sensory accuracy, strength, endurance, memory). Therefore, the proposed research has the potential to transform and disrupt a whole host of industries involving machine listening, ranging from human-robot augmentation (smart prosthetics, assistive listening technology, brain-computer interfaces) to human-robot collaboration (planetary exploration, search-and-rescue, hazardous material removal) and automation (environmental monitoring, autonomous vehicles, AI-assisted diagnosis in healthcare).
This project will consider the specific case study of a collaborative robot ('cobot') that augments the auditory experience of a hearing-impaired human partner. Hearing loss is the second most common disability in the UK, affecting 11M people. The loss of hearing affects situational awareness as well as the ability to communicate, which can impact on mental health and, in extreme cases, cognitive function. Nevertheless, for complex reasons that range from discomfort to social stigma, only 2M people choose to wear hearing aids.
The ambition of this project is to develop a cobot that will augment the auditory experience of a hearing-impaired person. The cobot will move autonomously within the human partner's household to assist with everyday tasks. Our research will enable the cobot to exploit ego-motion in order to learn an internal representation of the acoustic scene (children chattering, kettle boiling, spouse calling for help). The cobot will interface with its partner through an on-person smart device (watch, mobile phone). Using the human-cobot interface, the cobot will alert its partner of salient events (call for help) via vibrating messages, and share its auditory experiences via interactive maps that visualise auditory cues and indicate saliency (e.g., loudness, spontaneity) and valence (positive vs concerning).
In contrast to smart devices, the cobot will have the unique capability to actively attend to and explore uncertain events (thump upstairs), and take action (assist spouse, call ambulance) without the need for permanently installed devices in personal spaces (bathroom, bedroom). Therefore, the project has the potential to transform the lives of people with hearing impairments by enabling long-term independent living, safeguarding privacy, and fostering inclusivity.
Therefore, the ability to make sense of and exploit sounds in motion is a fundamental prerequisite for embodied Artificial Intelligence (AI). This project will pioneer the underpinning, probabilistic framework for active robot audition that enables embodied agents to control the motion of their own bodies ('ego-motion') for auditory attention in realistic, acoustic environments (households, public spaces, and environments involving multiple, competing sound sources).
By integrating sound with motion, this project will enable machines to imagine, control and leverage the auditory consequences of physical interactions with the environment. By transforming the ways in which machines make sense of life in sound, the research outcomes will be pivotal for new, emerging markets that enable robots to augment, rather than rival, humans in order to surpass the limitations of the human body (sensory accuracy, strength, endurance, memory). Therefore, the proposed research has the potential to transform and disrupt a whole host of industries involving machine listening, ranging from human-robot augmentation (smart prosthetics, assistive listening technology, brain-computer interfaces) to human-robot collaboration (planetary exploration, search-and-rescue, hazardous material removal) and automation (environmental monitoring, autonomous vehicles, AI-assisted diagnosis in healthcare).
This project will consider the specific case study of a collaborative robot ('cobot') that augments the auditory experience of a hearing-impaired human partner. Hearing loss is the second most common disability in the UK, affecting 11M people. The loss of hearing affects situational awareness as well as the ability to communicate, which can impact on mental health and, in extreme cases, cognitive function. Nevertheless, for complex reasons that range from discomfort to social stigma, only 2M people choose to wear hearing aids.
The ambition of this project is to develop a cobot that will augment the auditory experience of a hearing-impaired person. The cobot will move autonomously within the human partner's household to assist with everyday tasks. Our research will enable the cobot to exploit ego-motion in order to learn an internal representation of the acoustic scene (children chattering, kettle boiling, spouse calling for help). The cobot will interface with its partner through an on-person smart device (watch, mobile phone). Using the human-cobot interface, the cobot will alert its partner of salient events (call for help) via vibrating messages, and share its auditory experiences via interactive maps that visualise auditory cues and indicate saliency (e.g., loudness, spontaneity) and valence (positive vs concerning).
In contrast to smart devices, the cobot will have the unique capability to actively attend to and explore uncertain events (thump upstairs), and take action (assist spouse, call ambulance) without the need for permanently installed devices in personal spaces (bathroom, bedroom). Therefore, the project has the potential to transform the lives of people with hearing impairments by enabling long-term independent living, safeguarding privacy, and fostering inclusivity.
Organisations
- University of Southampton (Lead Research Organisation)
- National Oceanography Centre (Collaboration)
- Stanford University (Collaboration)
- National Oceanography Centre (Project Partner)
- Audio Analytic Ltd (UK) (Project Partner)
- Uni of Illinois at Urbana Champaign (Project Partner)
- University of Oxford (Project Partner)
- Consequential Robotics Ltd (Project Partner)
| Description | UoS-NOCS Collaboration on Fibre Optic Sensing |
| Organisation | National Oceanography Centre |
| Country | United Kingdom |
| Sector | Academic/University |
| PI Contribution | Our team contributes expertise in acoustic signal processing and deep learning for audio. Two PhD students who are jointly supervised with Dr Belal (NOCS) are developing novel machine learning models for a) the disentanglement of acoustic cues embedded in mixture signals, and b) denoising of fibre optic sensing data. |
| Collaborator Contribution | NOCS provide world-renowned expertise in densely distributed data acquisition. |
| Impact | Identifying and distinguishing among events in the marine environment is an essential task in developing better understanding of climate change, and animal and human behaviour across 71% of the planet. Sources of ambient noise in the marine environment can be classified into natural (sediment flows, volcanic geo-hazards, etc.) and anthropogenic (ocean bottom trawling, offshore drilling, etc.). The aim of this research is to radically improve ocean observation and visualization capabilities, both for oceanographic research and for various marine sector applications of national and strategic importance. This research collaboration between NOC and the University of Southampton aims to combine expertise in densely distributed big-data acquisition and machine learning and AI techniques to characterise and automatically identify patterns in this data to aid human understanding of the environment. The key challenges in this project stem from the volume of streaming data generated and the lack of substantial quantities of labelled signals. This is a highly multi-disciplinary collaboration that cuts across marine science, physics, as well as machine learning & AI. |
| Start Year | 2021 |
| Description | UoS-Stanford CCRMA Collaboration on Auditory Modelling |
| Organisation | Stanford University |
| Country | United States |
| Sector | Academic/University |
| PI Contribution | Our team works closely with the Stanford University Center for Computer Research in Music and Acoustics. Our team develops novel foundation models based on deep learning for machine listening. The collaboration is focused on bio-inspired models. Our team develops the mathematical models and implements the software for training, testing and evaluating models for, e.g., sound event classification and detection of salient events. |
| Collaborator Contribution | Stanford University (Center for Computer Research in Music and Acoustics) contribute expertise, advice and guidance on auditory modelling. |
| Impact | The collaboration is focused on bio-inspired deep-learning based models for audio. The collaboration brings together expertise from auditory modelling and deep learning. The collaboration is ongoing and, to date, has led to the following outputs: - Dr Evers and Professor Slaney supervised a Year 4 Undergraduate Group Design Project that was conducted between October 2024 and January 2025. The project developed a live demonstrator for immersive audio technologies that alerts end-users of augmented/virtual reality devices of salient acoustic events that are within their physically surrounding space. |
| Start Year | 2024 |
| Description | ECS Engage Outreach |
| Form Of Engagement Activity | Participation in an open day or visit at my research institution |
| Part Of Official Scheme? | No |
| Geographic Reach | Regional |
| Primary Audience | Schools |
| Results and Impact | Approximately 40 students attended for a school visit to the University of Southampton. Dr Evers's team presented a live robot demonstrator that showcased the benefits and challenges of audio data and processing the data using machine learning models. |
| Year(s) Of Engagement Activity | 2024 |
| Description | Invited Seminar, Stanford University, Center for Computer Research in Music and Acoustics (CCRMA) |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Postgraduate students |
| Results and Impact | Approximately 15 researchers from across academia and companies in the Palo Alto area attended for an invited seminar on "Embodied Audio". The seminar sparked discussions about the various challenges presented in machine listening. The seminar led to an ongoing collaboration between Dr Evers and Professor Malcolm Slaney (Stanford University, CCRMA). |
| Year(s) Of Engagement Activity | 2024 |
| URL | https://ccrma.stanford.edu/events/christine-evers-embodied-audio |
