Fluidity in simulated human-robot interaction with speech interfaces

Lead Research Organisation: Swansea University
Department Name: College of Science


The need for interactive robots which can collaborate successfully with human beings is becoming important in the UK considering some of the biggest challenges we now face, including the need for high-value manufacturing exports to compete economically internationally, robots which can handle dangerous waste and navigate hazardous environments, and robotics solutions for social care and medical assistance to meet our demographic challenges.

A key problem for human-robot interaction (HRI) with speech which limits the wider use of such robots is lack of fluidity. Although there have been significant recent advances in robot vision, motion, manipulation and automatic speech recognition, state-of-the-art HRI is slow, laboured and fragile. The contrast with the speed, fluency and error tolerance of human-human interaction is substantial. The FLUIDITY project will develop technology to monitor, control and increase the interaction fluidity of robots with speech understanding capabilities, such that they become more natural and efficient to interact with. The project will also address the difficulty of developing HRI models due to the time, logistics and cost of working with real-world robots by developing a toolkit for building and testing interactive robot models in a simulated Virtual Reality (VR) environment, making scalable HRI experiments for the wider robotics, HRI and natural language processing (NLP) communities possible.

The project focusses on pick-and-place robots which manipulate household objects in view where users will utter commands (e.g. "put the remote control on the table") and issue confirmations and corrections and repairs of the robot's current actions appropriately (e.g. "no, the other table"), allowing rapid, natural responses from both a human confederate teleoperating the robot model and automatic systems. Crucially, appropriate overlap of human speech and robot motion will be permitted to allow more human-like transitions. The project will put interaction fluidity and the rapid recovery from misunderstanding with appropriate repair mechanisms at the heart of interactive robots, which will lead to improved user experience.

The means for achieving fluid interaction will firstly be adaptation of Spoken Language Understanding (SLU) algorithms which are not only word-by-word incremental but go beyond that for more human-like real-time measures of confidence the robot has in its interpretation of the user's speech. For the basis of these algorithms, mediated Wizard-of-Oz data will be collected from pairs of human participants, with one participant confederate 'wizard' controlling the robot model and one user. From the visual, audio and motion data collected, SLU algorithms will be built which return the most accurate user intention incrementally word-by-word, but also a continuous measure of confidence corresponding as closely as possible to the reaction times of the human confederate.

The project will also address user perception of the robot's intention from the robot's motion by experimenting with different models of motion legibility. The hypothesis is that the more accurately the legibility of the robot's motion can be modelled in real time, the greater the fluidity of interaction possible, as user repairs and confirmations can be interpreted appropriately earlier in the robot's motion.

The SLU and legibility algorithms will be integrated in an end-to-end system where interaction fluidity can be controlled, with evaluation in both the VR environment and a comparison to a real-world robot model. The project will provide an abstract theoretical framework for interaction fluidity and practical outcomes of a VR environment, an HRI dataset collected in the environment which will be made publicly available for benchmarking, and software which will be open-source and adaptable for other robot models.


10 25 50