Spoken Dialogue Management using Partially Observable Markov Decision Processes

Lead Research Organisation: University of Cambridge
Department Name: Engineering

Abstract

Spoken dialogue systems have a wide range of application including call centre automation, control of devices in the home, interactive entertainment, and hands-free applications. Despite their increasing use, however, deployment costs remain high and operational systems continue to be fragile. A major contributor to both of these problems is that the core dialogue manager which interprets the spoken input, and plans the next response is a deterministic program, hand-crafted and manually tuned for each application.Experience applying statistical techniques in both speech recognition and synthesis has shown that learning from data and using optimal decision making can dramatically improve performance and lower costs. A natural framework for statistical dialogue modelling is the Markov Decision Process (MDP), however, a major limitation of MDPs is that they require the state of the system to be known exactly, and therefore they do not address the essense of the dialogue management problem which is to handle the uncertainty caused by speech recognition and understanding errors.The aim of this project is to develop a framework for spoken dialogue systems which uses a more general statistical model called a Partially Observable Markov Decision Process (POMDP). The key assumption in the POMDP is that the state of the system (which includes the goal in the user's mind) can never be known with certainty. Hence, it maintains a probability distribution over all possible states and bases its decisions on this distribution. In effect, the POMDP tracks every possible dialogue hypothesis at every turn, maintaining a probability for each. This provides it with a principled framework for handling ambiguity and uncertainty.Although this formulation is extremely powerful, it is also computationally very complex since the POMDP state is a vector in a very high dimensional continuous space. This makes direct belief monitoring and policy optimisation essentially intractable and hence little progress has been made towards real applications. Recently, however, the proposer has demonstrated that practical POMDP-based systems are feasible by exploiting two key ideas. Firstly, the complexity of belief monitoring can be greatly reduced by partitioning the state space into equivalence classes. Secondly, in the context of spoken dialogues, it is possible to map dialogue hypotheses into a much-reduced summary space where effective policy optimisation is possible. These ideas have been built into a prototype system called the Hidden Information State (HIS) system and their feasibility has been demonstrated and evaluated in a Tourist Information domain.Although it serves its purpose as a proof of concept, the HIS prototype was built using a simple 1-best recogniser interface, very simplistic probabilistic models, a hand-crafted user simulator and a rudimentary grid-based policy learning method. To fully realise the potential of POMDP-based systems, much more needs to be done and the programme of work set out in this proposal seeks to achieve this. The key areas that will be addressed are more efficient belief state partitioning and monitoring, accurate statistical user models trained on real data, integration of N-best recognition hypotheses, and improved summary state mapping and policy optimisation. The result will be a system which is trained automatically on data, which delivers high performance at low cost, which is significantly more robust to recognition errors, and which can learn and adapt on-line.

Publications

10 25 50

publication icon
Gašic M (2011) Effective handling of dialogue state in the hidden information state POMDP-based dialogue manager in ACM Transactions on Speech and Language Processing

publication icon
Gašic M. (2008) Training and evaluation of the HIS POMDP dialogue system in noise in Proceedings of the 9th SIGdial Workshop on Discourse and Dialogue, SIGDIAL 2008

publication icon
Mairesse F. (2010) Phrase-based statistical language generation using graphical models and active learning in Proceedings of the Annual Meeting of the Association for Computational Linguistics

publication icon
Schatzmann J (2009) The Hidden Agenda User Simulation Model in IEEE Transactions on Audio, Speech, and Language Processing

 
Description A new approach to designing speech-based human-computer interfaces which are cheaper to build, more robust in operation and which continue to improve during deployment via continuous adaptation.
Exploitation Route They are already being exploited via VocalIQ Limited and other R&D companies in this area.
Sectors Creative Economy,Digital/Communication/Information Technologies (including Software),Financial Services, and Management Consultancy,Healthcare,Transport

URL http://mi.eng.cam.ac.uk/~sjy
 
Description The findings from this study provided the basis for two EU Framework 7 projects (CLASSIC and PARLANCE), industrial support from General Motors and Toshiba and subsequently in 2011 the formation of a spin-off company VocalIQ Ltd to develop the technology (see www.vocaliq.com).
First Year Of Impact 2008
Sector Digital/Communication/Information Technologies (including Software),Transport
Impact Types Economic

 
Company Name VocalIQ Limited 
Description VocalIQ is a spin-out company from the Spoken Dialogue Systems Group at University of Cambridge, UK. Still based in Cambridge, the company builds a platform for voice interfaces, making it easy for everybody to voice enable their devices and apps. Example application areas include smartphones, robots, cars, call-centres, and games. They are funded by Amadeus Capital and Cambridge Enterprise. 
Year Established 2011 
Impact Contracts with General Motors and Jaguar LandRover to develop state of the art voice-driven systems.
Website http://www.vocaliq.com