Fluidity in simulated human-robot interaction with speech interfaces
Lead Research Organisation:
Swansea University
Department Name: College of Science
Abstract
The need for interactive robots which can collaborate successfully with human beings is becoming important in the UK considering some of the biggest challenges we now face, including the need for high-value manufacturing exports to compete economically internationally, robots which can handle dangerous waste and navigate hazardous environments, and robotics solutions for social care and medical assistance to meet our demographic challenges.
A key problem for human-robot interaction (HRI) with speech which limits the wider use of such robots is lack of fluidity. Although there have been significant recent advances in robot vision, motion, manipulation and automatic speech recognition, state-of-the-art HRI is slow, laboured and fragile. The contrast with the speed, fluency and error tolerance of human-human interaction is substantial. The FLUIDITY project will develop technology to monitor, control and increase the interaction fluidity of robots with speech understanding capabilities, such that they become more natural and efficient to interact with. The project will also address the difficulty of developing HRI models due to the time, logistics and cost of working with real-world robots by developing a toolkit for building and testing interactive robot models in a simulated Virtual Reality (VR) environment, making scalable HRI experiments for the wider robotics, HRI and natural language processing (NLP) communities possible.
The project focusses on pick-and-place robots which manipulate household objects in view where users will utter commands (e.g. "put the remote control on the table") and issue confirmations and corrections and repairs of the robot's current actions appropriately (e.g. "no, the other table"), allowing rapid, natural responses from both a human confederate teleoperating the robot model and automatic systems. Crucially, appropriate overlap of human speech and robot motion will be permitted to allow more human-like transitions. The project will put interaction fluidity and the rapid recovery from misunderstanding with appropriate repair mechanisms at the heart of interactive robots, which will lead to improved user experience.
The means for achieving fluid interaction will firstly be adaptation of Spoken Language Understanding (SLU) algorithms which are not only word-by-word incremental but go beyond that for more human-like real-time measures of confidence the robot has in its interpretation of the user's speech. For the basis of these algorithms, mediated Wizard-of-Oz data will be collected from pairs of human participants, with one participant confederate 'wizard' controlling the robot model and one user. From the visual, audio and motion data collected, SLU algorithms will be built which return the most accurate user intention incrementally word-by-word, but also a continuous measure of confidence corresponding as closely as possible to the reaction times of the human confederate.
The project will also address user perception of the robot's intention from the robot's motion by experimenting with different models of motion legibility. The hypothesis is that the more accurately the legibility of the robot's motion can be modelled in real time, the greater the fluidity of interaction possible, as user repairs and confirmations can be interpreted appropriately earlier in the robot's motion.
The SLU and legibility algorithms will be integrated in an end-to-end system where interaction fluidity can be controlled, with evaluation in both the VR environment and a comparison to a real-world robot model. The project will provide an abstract theoretical framework for interaction fluidity and practical outcomes of a VR environment, an HRI dataset collected in the environment which will be made publicly available for benchmarking, and software which will be open-source and adaptable for other robot models.
A key problem for human-robot interaction (HRI) with speech which limits the wider use of such robots is lack of fluidity. Although there have been significant recent advances in robot vision, motion, manipulation and automatic speech recognition, state-of-the-art HRI is slow, laboured and fragile. The contrast with the speed, fluency and error tolerance of human-human interaction is substantial. The FLUIDITY project will develop technology to monitor, control and increase the interaction fluidity of robots with speech understanding capabilities, such that they become more natural and efficient to interact with. The project will also address the difficulty of developing HRI models due to the time, logistics and cost of working with real-world robots by developing a toolkit for building and testing interactive robot models in a simulated Virtual Reality (VR) environment, making scalable HRI experiments for the wider robotics, HRI and natural language processing (NLP) communities possible.
The project focusses on pick-and-place robots which manipulate household objects in view where users will utter commands (e.g. "put the remote control on the table") and issue confirmations and corrections and repairs of the robot's current actions appropriately (e.g. "no, the other table"), allowing rapid, natural responses from both a human confederate teleoperating the robot model and automatic systems. Crucially, appropriate overlap of human speech and robot motion will be permitted to allow more human-like transitions. The project will put interaction fluidity and the rapid recovery from misunderstanding with appropriate repair mechanisms at the heart of interactive robots, which will lead to improved user experience.
The means for achieving fluid interaction will firstly be adaptation of Spoken Language Understanding (SLU) algorithms which are not only word-by-word incremental but go beyond that for more human-like real-time measures of confidence the robot has in its interpretation of the user's speech. For the basis of these algorithms, mediated Wizard-of-Oz data will be collected from pairs of human participants, with one participant confederate 'wizard' controlling the robot model and one user. From the visual, audio and motion data collected, SLU algorithms will be built which return the most accurate user intention incrementally word-by-word, but also a continuous measure of confidence corresponding as closely as possible to the reaction times of the human confederate.
The project will also address user perception of the robot's intention from the robot's motion by experimenting with different models of motion legibility. The hypothesis is that the more accurately the legibility of the robot's motion can be modelled in real time, the greater the fluidity of interaction possible, as user repairs and confirmations can be interpreted appropriately earlier in the robot's motion.
The SLU and legibility algorithms will be integrated in an end-to-end system where interaction fluidity can be controlled, with evaluation in both the VR environment and a comparison to a real-world robot model. The project will provide an abstract theoretical framework for interaction fluidity and practical outcomes of a VR environment, an HRI dataset collected in the environment which will be made publicly available for benchmarking, and software which will be open-source and adaptable for other robot models.
Publications
Nagele A
(2024)
"The sleep data looks way better than I feel." An autoethnographic account and diffractive reading of sleep-tracking
in Frontiers in Computer Science
Förster F
(2025)
Editorial: Failures and repairs in human-robot communication
in Frontiers in Robotics and AI
Förster F
(2023)
Working with roubles and failures in conversation between humans and robots: workshop report.
in Frontiers in robotics and AI
| Description | Swansea Intelligent Robotics Infrastructure |
| Amount | £300,000 (GBP) |
| Organisation | Higher Education Funding Council for Wales (HEFCW) |
| Sector | Public |
| Country | United Kingdom |
| Start | 12/2024 |
| Title | Conceptual Pact Models for Reference Resolution using Dynamic Small Language Models |
| Description | A reference resolution dataset and model for objects based on small language models which are dynamically constructed during conversations between participants, derived from the PENTOREF dataset. |
| Type Of Material | Computer model/algorithm |
| Year Produced | 2024 |
| Provided To Others? | Yes |
| Impact | The dataset and model are used by the EPSRC FLUIDITY project and by the EPSRC ARCIDUCA project as the basis for models of reference to objects in situated conversations. |
| URL | https://github.com/julianhough/conceptualpacts |
| Description | University of Hertfordshire collaboration |
| Organisation | University of Hertfordshire |
| Country | United Kingdom |
| Sector | Academic/University |
| PI Contribution | Leading the FLUIDITY project experiments and writing of papers. Co-hosting workshops and special sessions. |
| Collaborator Contribution | The time, expertise and use of the Hertfordshire Robot House. |
| Impact | See https://fluidity-project.github.io/publications for publications |
| Start Year | 2023 |
| Title | Python code for Conceptual Pact Models |
| Description | Python code for the 2024 LREC-COLING paper Hough et al., "Conceptual Pacts for Reference Resolution using Small, Dynamically Constructed Language Models: A Study in Puzzle Building Dialogues". |
| Type Of Technology | Software |
| Year Produced | 2024 |
| Open Source License? | Yes |
| Impact | Code is used by the EPSRC FLUIDITY project and by the EPSRC ARCIDUCA project as the technical implementation for models of reference to objects in situated conversations. |
| URL | https://github.com/julianhough/conceptualpacts/ |
| Description | Hosting of the workshop Fluidity in Human-Agent Interaction at the 12th International Conference on Human-Agent Interaction (HAI 2024) |
| Form Of Engagement Activity | Participation in an activity, workshop or similar |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Professional Practitioners |
| Results and Impact | Approximately 30 participants took part from academic and industrial research on the topic of fluidity in human-robot interaction. |
| Year(s) Of Engagement Activity | 2024 |
| URL | https://fluidity-project.github.io/fluidityhaiworkshop |