Spoken Dialogue Management using Partially Observable Markov Decision Processes

Lead Research Organisation: University of Cambridge

Department Name: Engineering

Abstract

Spoken dialogue systems have a wide range of application including call centre automation, control of devices in the home, interactive entertainment, and hands-free applications. Despite their increasing use, however, deployment costs remain high and operational systems continue to be fragile. A major contributor to both of these problems is that the core dialogue manager which interprets the spoken input, and plans the next response is a deterministic program, hand-crafted and manually tuned for each application.Experience applying statistical techniques in both speech recognition and synthesis has shown that learning from data and using optimal decision making can dramatically improve performance and lower costs. A natural framework for statistical dialogue modelling is the Markov Decision Process (MDP), however, a major limitation of MDPs is that they require the state of the system to be known exactly, and therefore they do not address the essense of the dialogue management problem which is to handle the uncertainty caused by speech recognition and understanding errors.The aim of this project is to develop a framework for spoken dialogue systems which uses a more general statistical model called a Partially Observable Markov Decision Process (POMDP). The key assumption in the POMDP is that the state of the system (which includes the goal in the user's mind) can never be known with certainty. Hence, it maintains a probability distribution over all possible states and bases its decisions on this distribution. In effect, the POMDP tracks every possible dialogue hypothesis at every turn, maintaining a probability for each. This provides it with a principled framework for handling ambiguity and uncertainty.Although this formulation is extremely powerful, it is also computationally very complex since the POMDP state is a vector in a very high dimensional continuous space. This makes direct belief monitoring and policy optimisation essentially intractable and hence little progress has been made towards real applications. Recently, however, the proposer has demonstrated that practical POMDP-based systems are feasible by exploiting two key ideas. Firstly, the complexity of belief monitoring can be greatly reduced by partitioning the state space into equivalence classes. Secondly, in the context of spoken dialogues, it is possible to map dialogue hypotheses into a much-reduced summary space where effective policy optimisation is possible. These ideas have been built into a prototype system called the Hidden Information State (HIS) system and their feasibility has been demonstrated and evaluated in a Tourist Information domain.Although it serves its purpose as a proof of concept, the HIS prototype was built using a simple 1-best recogniser interface, very simplistic probabilistic models, a hand-crafted user simulator and a rudimentary grid-based policy learning method. To fully realise the potential of POMDP-based systems, much more needs to be done and the programme of work set out in this proposal seeks to achieve this. The key areas that will be addressed are more efficient belief state partitioning and monitoring, accurate statistical user models trained on real data, integration of N-best recognition hypotheses, and improved summary state mapping and policy optimisation. The result will be a system which is trained automatically on data, which delivers high performance at low cost, which is significantly more robust to recognition errors, and which can learn and adapt on-line.

Funded Value:

£360,667

Funded Period:

Oct 07 - Sep 10

Funder:

EPSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

EP/F013930/1

Principal Investigator:

Stephen Young

Research Subject:

Info. & commun. Technol. (100%)

Research Topic:

Human Communication in ICT (100%)

Organisations

University of Cambridge (Lead Research Organisation)

People	ORCID iD
Stephen Young (Principal Investigator)

Publications

Author Name

Title Publication Date Published

|< < 1 2 > >|

10 25 50

Gasic M (2009) Back-off action selection in summary space-based POMDP dialogue systems

Gašic M (2011) Effective handling of dialogue state in the hidden information state POMDP-based dialogue manager in ACM Transactions on Speech and Language Processing

Gašic M. (2008) Training and evaluation of the HIS POMDP dialogue system in noise in Proceedings of the 9th SIGdial Workshop on Discourse and Dialogue, SIGDIAL 2008

Jurcícek F (2011) Natural actor and belief critic Reinforcement algorithm for learning parameters of dialogue systems modelled as POMDPs in ACM Transactions on Speech and Language Processing

Mairesse F. (2010) Phrase-based statistical language generation using graphical models and active learning in Proceedings of the Annual Meeting of the Association for Computational Linguistics

Schatzmann J (2009) The Hidden Agenda User Simulation Model in IEEE Transactions on Audio, Speech, and Language Processing

Simon Keizer (2010) Parameter Estimation for agenda-based user simulation

Thomson B (2010) Bayesian dialogue system for the Let's Go Spoken Dialogue Challenge

Thomson B (2010) Bayesian update of dialogue state: A POMDP framework for spoken dialogue systems in Computer Speech & Language

Young S (2010) The Hidden Information State model: A practical framework for POMDP-based spoken dialogue management in Computer Speech & Language

Key Findings
Impact Summary
Spin Outs


Description	A new approach to designing speech-based human-computer interfaces which are cheaper to build, more robust in operation and which continue to improve during deployment via continuous adaptation.
Exploitation Route	They are already being exploited via VocalIQ Limited and other R&D companies in this area.
Sectors	Creative Economy,Digital/Communication/Information Technologies (including Software),Financial Services, and Management Consultancy,Healthcare,Transport
URL	http://mi.eng.cam.ac.uk/~sjy


Description	The findings from this study provided the basis for two EU Framework 7 projects (CLASSIC and PARLANCE), industrial support from General Motors and Toshiba and subsequently in 2011 the formation of a spin-off company VocalIQ Ltd to develop the technology (see www.vocaliq.com).
First Year Of Impact	2008
Sector	Digital/Communication/Information Technologies (including Software),Transport
Impact Types	Economic


Company Name	VocalIQ Limited
Description	VocalIQ is a spin-out company from the Spoken Dialogue Systems Group at University of Cambridge, UK. Still based in Cambridge, the company builds a platform for voice interfaces, making it easy for everybody to voice enable their devices and apps. Example application areas include smartphones, robots, cars, call-centres, and games. They are funded by Amadeus Capital and Cambridge Enterprise.
Year Established	2011
Impact	Contracts with General Motors and Jaguar LandRover to develop state of the art voice-driven systems.
Website	http://www.vocaliq.com

Abstract

Organisations

People

ORCID iD

Publications