Babble: domain-general methods for learning natural spoken dialogue systems

Lead Research Organisation: Heriot-Watt University

Department Name: S of Mathematical and Computer Sciences

Abstract

The demand for future conversational speech technologies is estimated to reach a market value of $3 billion by 2020 (Grand View Research, 2014). Our proposed technology will provide vital foundations and impetus for the rapid development of a next-generation of naturally interactive conversational interfaces with deep language understanding, in areas as diverse as healthcare, human-robot interaction, wearables, home automation, education, games, and assistive technologies.

Future conversational speech interfaces should allow users to interact with machines using everyday spontaneous language to achieve everyday needs. A commercial example with quite basic capabilities is Apple's Siri. However, even today's limited speech interfaces are very difficult and time-consuming to develop for new applications: their key components currently need to be tailor-made by experts for specific application domains, relying either on hand-written rules or statistical methods that depend on large amounts of expensive, domain-specific, human-annotated dialogue data. The components thus produced are of little or no use for any new application domain, resulting in expensive and time-consuming development cycles.

One key underlying reason for this status quo is that for spoken dialogue, general, scalable methods for natural language understanding (NLU), dialogue management (DM), and language generation (NLG) are not yet available. Current domain-general methods for language processing are sentence-based and so perform fairly well for processing written text, but they quickly run into difficulties in the case of spoken dialogue, because ordinary conversation is highly fragmentary and incremental: it naturally happens word-by-word, rather than sentence-by-sentence. Real conversation happens bit by bit, using half-starts, suggested add-ons, pauses, interruptions, and corrections -- without respecting the boundaries of sentences. And it is precisely these properties that contribute to the feeling of being engaged in a normal, natural conversation, which current state-of-the-art speech interfaces fail to produce.

We propose to solve these two problems together, by for the first time:

(1) combining domain-general, incremental, and scalable approaches to NLU, DM, and NLG;

(2) developing machine learning algorithms to automatically create working speech interfaces from data, using (1).

We propose a new method "BABBLE" in which speech systems can be trained to interact naturally with humans, much like a child who experiments with new combinations of words to discover their usefulness (though doing this offline to avoid annoying real users while doing so!).

BABBLE will be deployed as a developer kit and as mobile speech Apps for public use and engagement, and will also generate large dialogue data sets for scientific and industry use.

This new method will not require expensive data annotation or expert developers, leading to easy creation of new speech interfaces that advance the state-of-the-art in interacting more naturally, and therefore more successfully and engagingly with users.

New advances have been made in key areas relevant to this proposal: incremental grammars, formal semantic models of dialogue, and sample-efficient machine learning methods. The opportunity to combine and develop these approaches has arisen only recently, and now makes major advances in spoken dialogue technology possible.

Planned Impact

There are 3 main arenas of impact for this work: commercial, societal, and academic.
The major societal impact of this work is in widening access to information technology, and making it more efficient and engaging for users through the use of spoken conversation.
The core technology for natural conversational speech interfaces developed here has the potential to impact on several important groups of users:

* Internet users, mobile App users, and wearable technology users via improved conversational interfaces for control and search;
* Visually impaired individuals via better speech interfaces;
* The elderly and disabled, via automated assisted independent living using dialogue-based interaction;
* Users of immersive virtual reality interfaces (using speech for control and search);
* Computer Game players, e.g. via improved dialogue-based interaction with virtual characters;
* Users of educational technologies, using conversational virtual characters for learning;
* People interacting with robots (spoken Human-Robot Interaction).

In all of the above areas, the important economic impact of the work is also in lowering costs for industry through the automation of speech interface development -- a central objective of this proposal. Companies will benefit through the lowering or removal of costs associated with hiring expert system developers and data annotators. These advances will make speech interfaces a more affordable technology, deployable more rapidly in wider contexts.
Academic and industrial researchers will also benefit via the provision of new, open resources for conversational system development.

To realise the full impact of the research, we will therefore focus on the following pathways:

* BABBLE mobile Apps: public engagement with the developed systems released as speech Apps on smartphones, mobile devices, and wearables (like Siri and Google Now);

* Open data releases: the large anonymised spoken dialogue data-sets generated by the project will be of interest to academic researchers and industry;

* BABBLE Toolkit: software and tools developed will be released for academic and industrial use;

* Interdisciplinary publications and research visits: combining research from two previously disconnected communities: wide-coverage grammars of Natural Language and statistical approaches to automated dialogue systems;

* Demonstrations: showcasing the BABBLE technology at events such as the Scottish Informatics and Computer Science Alliance (SICSA) demofests and EC ICT events, which bring together academia with industrial developers;

* Robotarium demos: showcasing BABBLE systems for human-robot interaction applications at the Robotarium (EPSRC Infrastructure Grant, 2013);

* Impact workshop: organised at the end of the project to amplify these avenues to impact, inviting researchers from both academia and industry;

* Spin-out company: the BABBLE technology will be integrated in to a spin-out speech technology company which is currently being developed at Heriot-Watt by the PI, under an "Impact Acceleration" grant.

To reach the general public as well as interested academic and industry researchers, we will also use a range of social media such as Twitter and YouTube, as we have done in the past with success for a number of our projects, as well as using traditional websites. We will also further develop our contacts with press and media, which have lead to a number of newspaper, radio, and television outputs describing the PI's research. Our advisory board member Dr. Matthew Purver, founder and Chief Data Scientist of Chatterbox, will advise us on technology transfer and commercialisation of dialogue and language technology (see letters of support).

Funded Value:

£278,283

Funded Period:

Apr 15 - Sep 17

Funder:

EPSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

EP/M01553X/1

Principal Investigator:

Oliver Lemon

Research Subject:

Info. & commun. Technol. (100%)

Research Topic:

Artificial Intelligence (100%)

Organisations

Heriot-Watt University (Lead Research Organisation)

People	ORCID iD
Oliver Lemon (Principal Investigator)
Arash Eshghi (Researcher Co-Investigator)

Publications

Author Name

Title Publication Date Published

|< < 1 2 3 > >|

10 25 50

Arash Eshghi (2015) Feedback in Conversation as Incremental Semantic Update

Arash Eshghi (2015) DS-TTR: An incremental, semantic, contextual parser for dialogue

Cuay (2015) Strategic Dialogue Management via Deep Reinforcement Learning in arXiv e-prints

Eshghi A (2017) Grammars as Mechanisms for Interaction: The Emergence of Language Games in Theoretical Linguistics

Eshghi A (2017) Interactional Dynamics and the Emergence of Language Games

Eshghi A (2017) Bootstrapping dialogue systems: the contribution of a semantic model of interactional dynamics

Eshghi A (2017) Bootstrapping incremental dialogue systems from minimal data: the generalisation power of dialogue grammars

Healey PGT (2018) Running Repairs: Coordinating Meaning in Dialogue. in Topics in cognitive science

Howes C (2017) Feedback relevance spaces: the organisation of increments in conversation

Kalatzis Dimitrios (2016) Bootstrapping incremental dialogue systems: using linguistic knowledge to learn from minimal data in arXiv e-prints

Key Findings
Impact Summary
Further Funding
Research Databases and Models
Software and Technical Products
Spin Outs
Engagement Activities


Description	We have developed a method which allows the automatic construction of natural spoken dialogue systems (or conversational agents), such as Alexa and Siri, from very small amounts of un-annotated data. This reduces the need for expert developers, and so reduces cost. This is an advance because previous methods required large amounts of data to be collected and annotated, which is both expensive and time-consuming. Moreover, our method supports more natural human conversation than many previous systems, because it processes language word-by-word, rather than needing to wait for the end of an utterance, as current systems like Alexa do. We have applied this method in a number of dialogue systems. We also entered the Amazon Alexa Challenge in 2017, to build a socially intelligent conversational agent, and were one of only 3 teams who made it to the final. We competed again in 2018. and also made it to the final. We have also developed and tested a method for creating conversational agents that can learn novel word meanings from a human tutor (best paper award, ACL Robo-NLP 2017). We also released 2 new datasets for conversational agent development. The BABBLE software is available as open-source. We have spun-out a company (in 2020), called Alana AI, from the university which focusses on conversational AI.
Exploitation Route	This technology can be used by developers of future speech and dialogue interfaces -- to more rapidly and cheaply develop natural spoken dialogue interfaces for conversational devices and services. We have also released new dialogue data and software for use by others.
Sectors	Creative Economy,Digital/Communication/Information Technologies (including Software),Education,Financial Services, and Management Consultancy,Healthcare,Culture, Heritage, Museums and Collections,Retail
URL	https://sites.google.com/site/hwinteractionlab/babble


Description	This project has potential economic impact in developing a prototype method for creating dialogue systems from small amounts of data, without the need for expert developers, thus reducing the cost of developing such systems substantially. The potential impact is also in terms of more natural and robust dialogue systems and Human-Robot Interaction (HRI). We were involved in several knowledge transfer collaborations with industry, including new funded projects with Voysis.com (£103K), Speech-Graphics/DataLab (£110K), and Amazon ($350K, for the 2017 and 2018 Amazon Alexa Challenges). We are also involved in the 2022 Amazon Alexa Simbot challenge. These collaborations use our technical expertise and experience in dialogue system development gained during the project. We were also featured on BBC documentaries "6 robots and us" in 2017, and "The Joy of AI" in 2018. We also appeared on "Tomorrow's World Live" in 2018. We have also created a new MSc programme in AI with Speech and Multimodal Interaction, where methods and techniques developed as part of this project are taught to students in new courses on Conversational Agents. 30 students took this course in 2018, 35 in 2019, and about 70 students in both 2020 and 2021. We have spun-out a company (in 2020), called Alana AI (www.alanaai.com) from the university which focusses on conversational AI.
First Year Of Impact	2017
Sector	Digital/Communication/Information Technologies (including Software),Education
Impact Types	Cultural,Societal,Economic


Description	Amazon Alexa Challenge 2017
Amount	$100,000 (USD)
Organisation	Amazon.com
Sector	Private
Country	United States
Start	11/2016
End	11/2017


Description	Amazon Alexa Challenge 2018
Amount	$250,000 (USD)
Organisation	Amazon.com
Sector	Private
Country	United States
Start	01/2018
End	11/2018


Description	Commercial Contract - Voysis
Amount	£103,390 (GBP)
Organisation	Voysis Ltd
Sector	Private
Country	Ireland
Start	10/2017
End	09/2018


Description	DataLab
Amount	£110,033 (GBP)
Organisation	Government of Scotland
Department	Scottish Funding Council
Sector	Public
Country	United Kingdom
Start	07/2018
End	06/2019


Title	BURCHAK human-human dialogue dataset
Description	A new freely available human-human dialogue data set for interactive learning of word meanings from a human tutor. The data has been collected using the DiET Chat Tool (Healey et al., 2003; Mills and Healey, submitted) with a novel task, where a Learner needs to learn invented visual attribute words (such as "burchak" for square) from a tutor.
Type Of Material	Database/Collection of data
Year Produced	2017
Provided To Others?	Yes
Impact	Other researchers in the field have already started using this dataset, which can be used to train conversational agents that can learn new word meanings from a human tutor. No dataset for this task existed prior to this.
URL	https://arxiv.org/abs/1709.10431


Title	NIPS 2015 deep RL data
Description	Data for results reported in the paper: Strategic Dialogue Management via Deep Reinforcement Learning, NIPS 2015.
Type Of Material	Database/Collection of data
Year Produced	2016
Provided To Others?	Yes
Impact	The first application of Deep Reinforcement Learning methods to dialogue management problems.


Title	bAbI+ dataset
Description	Dialog bAbI+ bAbI+ is an extension of the bAbI Task 1 dialogues with everyday incremental dialogue phenomena (hesitations, restarts, and corrections) which model the disfluencies and communication problems in everyday spoken interaction in real-world environments. See https://www.researchgate.net/publication/319128941_Challenging_Neural_Dialogue_Models_with_Natural_Data_Memory_Networks_Fail_on_Incremental_Phenomena, http://aclweb.org/anthology/D17-1235
Type Of Material	Database/Collection of data
Year Produced	2018
Provided To Others?	Yes
Impact	Released as part of ParlAI from Facebook research see: http://www.parl.ai/static/docs/tasks.html
URL	http://www.parl.ai/static/docs/tasks.html


Title	BABBLE software
Description	This is an implementation continuously under development of: (a) Dynamic Syntax and Type Theory with Records (DS-TTR) (Cann et. al. 2015; Eshghi et. al. 2012), a word-by-word incremental semantic grammar, especially suited to dialogue processing, with semantic and contextual representations produced in the TTR formalism. The implementation contains the following: (1) depth-first and breadth-first parsers and generators, based on hand-crafted domain-specific lexicons that cover a broad range of structures, including relatives clauses and tense; (2) a prototype incremental dialogue system, DyLan, based on Jindigo (Skantze & Hjalmarsson, 2010), but using the Dynamic Syntax (Kempson et. al. 2001, Cann et. al. 2005) parser/generator; (3) a grammar induction module which learns DS incremental grammars from data (Eshghi et. al. 2013). This has been updated with improvements made under the ESPRC BABBLE project, specifically to incorporate an interactive parser based on Eshghi et. al.'s (2015) model of feedback in dialogue. (b) A Dialogue System for interactive learning of visually grounded language from a human partner that uses DS-TTR for dialogue processing and grounding (Yu et. al. 2016). (c) An integration of DS-TTR with Reinforcement Learning, allowing incremental dialogue systems to be automatically induced from raw, unannotated dialogue examples (Eshghi & Lemon, 2014; Kalatzis et. al. 2016). (a) has been updated and improved continuously throughout the BABBLE project. (b) and (c) have been produced exclusively within the BABBLE project.
Type Of Technology	Software
Year Produced	2017
Open Source License?	Yes
Impact	Demonstration at SemDial 2015 conference. Demo at INLG 2016 conference. Demo at Semdial 2016 conference. Contribution to best paper award at ACL Robo-NLP 2017. Demonstration system VOILA at SIGDIAL 2017.
URL	https://bitbucket.org/dylandialoguesystem


Title	Multi-Task Learning for Domain-General Spoken Disfluency Detection in Dialogue Systems
Description	Code for the paper "Multi-Task Learning for Domain-General Spoken Disfluency Detection in Dialogue Systems" (Igor Shalyminov, Arash Eshghi, and Oliver Lemon) [SemDial 2018
Type Of Technology	Software
Year Produced	2018
Impact	Spontaneous spoken dialogue is often disfluent, containing pauses, hesitations, self-corrections and false starts. Processing such phenomena is essential in understanding a speaker's intended meaning and controlling the flow of the conversation. Furthermore, this processing needs to be word-by-word incremental to allow further downstream processing to begin as early as possible in order to handle real spontaneous human conversational behaviour. In addition, from a developer's point of view, it is highly desirable to be able to develop systems which can be trained from 'clean' examples while also able to generalise to the very diverse disfluent variations on the same data - thereby enhancing both data-efficiency and robustness. In this paper, we present a multitask LSTM-based model for incremental detection of disfluency structure1 , which can be hooked up to any component for incremental interpretation (e.g. an incremental semantic parser), or else simply used to 'clean up' the current utterance as it is being produced. We train the system on the Switchboard Dialogue Acts (SWDA) corpus and present its accuracy on this dataset. Our model outperforms prior neural network-based incremental approaches by about 10 percentage points on SWDA while employing a simpler architecture. To test the model's generalisation potential, we evaluate the same model on the bAbI+ dataset, without any additional training. bAbI+ is a dataset of synthesised goal-oriented dialogues where we control the distribution of disfluencies and their types. This shows that our approach has good generalisation potential, and sheds more light on which types of disfluency might be amenable to domain-general processing.
URL	https://github.com/ishalyminov/multitask_disfluency_detection


Title	SimpleDS -- Deep Reinforcement Learning for dialogue management
Description	SimpleDS is a simple dialogue system trained with deep reinforcement learning. In contrast to other dialogue systems, this system selects dialogue actions directly from raw (noisy) text of the last system and user responses. The motivation is to train dialogue agents with as little human intervention as possible.
Type Of Technology	Software
Year Produced	2016
Open Source License?	Yes
Impact	This software is now being used in several projects in our lab. It lead to our NIPS 2015 paper and it was demonstrated at the IWSDS 2016 conference.
URL	https://github.com/cuayahuitl/SimpleDS


Company Name	ALANA AI LIMITED
Description	See www.alanaai.com Alana AI is a spin-out company of Heriot-Watt University focussing on Conversational AI, NLP, and machine learning. It is based on the expertise of the Interaction Lab at Heriot-Watt, and in particular the experience of the team who developed a successful Alexa Prize system. The expertise of the company scientists (Lemon, Rieser, Eshghi, Konstas) has also been partly developed through their experience in a variety of EPSRC projects as well as EC projects.
Year Established	2019
Impact	The Alana conversational AI platform. Currently in final stages of negotiating a project with UNICEF on tackling covid misinformation. Formal partnership with RNIB to develop conversational interfaces for blind and partially sighted people.
Website	http://www.alanaai.com


Description	2nd Dynamic Syntax Conference
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Professional Practitioners
Results and Impact	This was an academic conference, website here: https://sites.google.com/site/seconddsconf/
Year(s) Of Engagement Activity	2018
URL	https://sites.google.com/site/seconddsconf/


Description	National TV - demonstration and discission on "The Joy of AI" documentary, BBC4
Form Of Engagement Activity	A broadcast e.g. TV/radio/film/podcast (other than news/press)
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Public/other audiences
Results and Impact	Demonstrated conversational AI system on BBC4 documentary "The Joy of AI" hosted by Jim Al Khalili -- international reach. Many follow-up requests for collaboration.
Year(s) Of Engagement Activity	2018
URL	https://www.bbc.co.uk/programmes/p06jt7j4


Description	National TV: demonstration on BBC's Tomorrow's World Live
Form Of Engagement Activity	A broadcast e.g. TV/radio/film/podcast (other than news/press)
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Public/other audiences
Results and Impact	Prof Lemon gave a live demonstration of a conversational AI system and discussed the science behind it on Tomorrow's World Live -- a TV broadcast with international reach and a large audience.
Year(s) Of Engagement Activity	2018
URL	https://www.bbc.co.uk/programmes/p06vvw9h


Description	Poster at Alan Turing Institute Deep Learning workshop
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Edinburgh University hosted the "Alan Turing Institute Deep Learning workshop" in 2015. We presented out work to practitioners and postgraduate students.
Year(s) Of Engagement Activity	2015
URL	http://workshops.inf.ed.ac.uk/deep/deepATI/


Description	Research Visit to Dialogue Systems Group, Bielefeld
Form Of Engagement Activity	A formal working group, expert panel or dialogue
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Eshghi visited the DSG research group at Bielefeld University. This included a presentation of BABBLE project ideas & results to the group, as well as one-on-one discussion sessions with individual members of the group.
Year(s) Of Engagement Activity	2016


Description	Research visit to the Centre for Linguistic Theory and Studies in Probability, Gothenburg
Form Of Engagement Activity	A formal working group, expert panel or dialogue
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Eshghi visited the CLASP centre, a research group at Gothenburg University. This included a presentation of BABBLE project ideas & results to the group, as well as one-on-one discussion sessions with its individual members.
Year(s) Of Engagement Activity	2015


Description	SICSA workshop on Conversational AI
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Professional Practitioners
Results and Impact	We organised a day workshop on 12/12/2018 on Conversational AI -- see https://sites.google.com/site/workshoponconversationalai/home There were about 60 attendees from all over the UK, including industry and media (e.g. BBC, Voysis).
Year(s) Of Engagement Activity	2018
URL	https://sites.google.com/site/workshoponconversationalai/home


Description	TED-x technology demonstration session
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Industry/Business
Results and Impact	Heriot-Watt hosted the TEDx event described here: http://www.tedxhwu.com/ I did a demo session for participants, who were a mix of business, media, professionals, and students -- showcasing some of our dialogue and robot technology
Year(s) Of Engagement Activity	2015
URL	http://www.tedxhwu.com/


Description	What does a robot think about when noone is around
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Local
Primary Audience	Public/other audiences
Results and Impact	A public outreach activity at the Science Festival in Edinburgh in March 2018.
Year(s) Of Engagement Activity	2018

Abstract

Planned Impact

Organisations

People

ORCID iD

Publications