CODA: COherent Dialogue Automatically generated from text

Lead Research Organisation: The Open University
Department Name: Computing

Abstract

The CODA (COherent Dialogue Automatically generated from text) project will make a contribution to realizing the UK Government's Council for Science and Technology's vision of `providing people with services and information when, where and how they need it [...] Interaction will be through next generation personal digital assistants [...] and doubtlessly a variety of other human-oriented methods as yet unforeseen'. CODA will help achieve this by developing the theory and technology for automatically creating dialogue content from text in monologue form. There is ample empirical evidence that presentation of information in the form of a dialogue can be more effective than monologue in certain settings (e.g., tutoring and persuasive communication). Since most information is, however, locked up in text (books, leaflets, webpages, etc.), text-to-dialogue generation technology can play an important role in making information available in a form that best meets people's needs for easily processible and engaging information. The effectiveness of dialogue is magnified by the fact that it is eminently suitable for new multimedia presentation styles - e.g., a dialogue can be performed by digital computer-animated characters. Thus, presentation of information in dialogue form promises to not only deliver effective information presentation, but also entertain and engage people, as evidenced by the widespread use of dialogue in conventional media, such as news bulletins, commercials, educational entertainment and games. The proposed research builds on a preliminary feasibility study that was undertaken in collaboration with Dr. Prendinger at the National Institute of Informatics (Tokyo). That research led to a first prototype that takes a patient information leaflet with text such as: You can use aspirin, if you have a headache. Though aspirin does have side effects: it can harm the circulation. , and automatically generates a dialogue between a virtual pharmacist and client: C: What if I have a headache? P: You can use aspirin. C: But does it have side effects? P: Yes, it can harm the circulation. . Dr. Prendinger is proposed as Visiting Researcher for the current project.The project will develop the aforementioned first prototype into a domain-independent system for the generation of dialogue from text such that the meaning of the input text is preserved and the resulting dialogue is both coherent and cohesive. It will also produce the, to our knowledge, first extensive collection of text spans paired with snippets of dialogue that are equivalent in meaning (a parallel text-dialogue corpus). This corpus will be used in the project to learn transformations from text to dialogue that the system will then implement. During the second half of the project, a thorough evaluation of the system will take place to determine the quality of the content and organization of the generated dialogues. It will be applied to input texts from a variety domains to put its robustness/domain-independence to the test. We anticipate that if successful, this project will lead to potentially commercially exploitable middle-ware for bridging the gap between content locked up in text and effective and engaging presentations of information through state-of-the-art multimedia presentation tools, with applications in education (presentation of textbook materials), E-health (presenting medical information in an engaging way), and serious/educational games (automatic generation of dialogue content for non-player characters).

Publications

10 25 50
publication icon
Kuyten P (2012) Intelligent Virtual Agents

publication icon
Piwek P (2017) Dialogue across Media

publication icon
Piwek P (2011) Data-oriented Monologue-to-Dialogue Generation in Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: shortpapers

publication icon
Piwek, P (2010) Generating Expository Dialogue from Monologue:Motivation, Corpus and Preliminary Rules in 11th Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL) 2010

publication icon
Stoyanchev S (2010) Constructing the CODA Corpus: A Parallel Corpus of Monologues and Expository Dialogues in 7th international conference on Language Resources and Evaluation (LREC) 2010

publication icon
Stoyanchev S (2011) The CODA System for Monologue-to-Dialogue Generation in Proceedings of the SIGDIAL 2011: the 12th Annual Meeting of the Special Interest Group on Discourse and Dialogue

publication icon
Stoyanchev S (2011) Intelligent Virtual Agents

publication icon
Stoyanchev S (2010) Harvesting re-usable high-level rules for expository dialogue generation in 6th International Natural Language Generation Conference (INLG 2010)

 
Description The main aim of the CODA project was to develop the theory and technology for automatically transforming text into coherent and cohesive expository -- i.e., information delivering -- dialogue, such that the transformation from text to dialogue preserves the informational content of the input text. This aim involved four equally important objectives: 1) Creation of the, to our knowledge, first parallel corpus of text-dialogue pairs; 2) Formulation of transformation rules based on the corpus which relate certain patterns in text in monologue form (in particular, underlying coherence relations) to patterns in dialogue (specific dialogue acts, moves or structures); 3) Implementation of the transformations in a system for text-to-dialogue generation; 4) Evaluation of content and organization of the dialogues that the system generates.
Exploitation Route The Papworth Trust commissioned two videos based on their information leaflets for service users. The content of the videos was prepared using the CODA Monologue-to-Dialogue technology. The videos are available on the Papworth Trust's Homepage and their YouTube Channel: http://www.youtube.com/user/papworthtrust#p/u/0/Tb-MkbwAneY and http://www.youtube.com/user/papworthtrust#p/u/1/A2jLwvJ_kE8. Bored with reading the instructions? Watch and listen instead!



Information leaflets may soon be more than just a piece of paper. Research at The Open University has made significant steps towards changing the way information is conveyed by translating text into dialogue, while preserving clarity and meaning. The EPSRC-funded project, Coherent Dialogue Automatically Generated from Text (CODA), has developed the theory and technology for semi-automatic conversion of text to dialogue. It is specifically geared towards creating conversations between lay people and experts, for example, a patient and doctor.



Despite evidence supporting the fact that dialogue is more effective than monologue in tutoring and persuasive communication, most information is still locked up in text, including books, leaflets or web pages.



Text-to-dialogue generation technology can play an important role in making information available in a form that best meets people's needs when processing information. Dialogue is much more suitable for new multimedia presentation styles and can, for example, be performed by digital computer-animated characters.
Sectors Digital/Communication/Information Technologies (including Software)

URL http://computing.open.ac.uk/coda/
 
Description The dialogue from text generation technology was used by the Papworth trust to create videos for two of their information leaflets. These videos were used on their website to inform service users. See http://computing.open.ac.uk/coda/publicity.html
First Year Of Impact 2011
Sector Communities and Social Services/Policy,Healthcare
Impact Types Societal

 
Description Studentship funding directly by employer (WDS/Xerox) of student
Amount £12,000 (GBP)
Funding ID N/A 
Organisation Xerox Corporation 
Department Xerox Europe
Sector Private
Country United Kingdom
Start 06/2015 
End 06/2021
 
Title CODA Tools 
Description CODA Tools software Release 1.1 February 20, 2012. This release contains 1) software for converting text parsed with RST relations into dialogue and 2) an anotation tool for annotating dialogue and translating it into monologue (used for creating CODA corpus). 
Type Of Material Computer model/algorithm 
Year Produced 2012 
Provided To Others? Yes  
Impact Downloaded by researchers at several research establishments, including Takamura lab at Tokyo Institute of Technology, Hankuk University of Foreign Studies (South Korea), Cornell and Columbia Universities. 
URL http://computing.open.ac.uk/coda/data.html
 
Title CODA corpus 
Description CODA corpus Release 1.0 July 16, 2010. This release contains approximately 700 turns of human-authored expository dialogue (by Mark Twain and George Berkeley) which has been aligned with monologue that expresses the same information as the dialogue. The monologue side is annotated with Coherence Relations (RST), and the dialogue side with Dialogue Act tags. 
Type Of Material Database/Collection of data 
Year Produced 2010 
Provided To Others? Yes  
Impact The monologue-to-dialogue corpus has been released to the research community (100 downloads, including Carnegie Mellon and Xerox). 
URL http://computing.open.ac.uk/coda/data.html
 
Title QG Corpus 
Description QGSTEC 2010 Generating Questions from Sentences Corpus December 21, 2010. A corpus of over 1000 questions (both human and machine generated). The automatically generated questions have been rated by several raters according to five criteria (relevance, question type, syntactic correctness and fluency, ambiguity, and variety). 
Type Of Material Database/Collection of data 
Year Produced 2010 
Provided To Others? Yes  
Impact The QGSTEC development, evaluation and results data were made available to the research community (54 downloads so far from groups including IBM, MITRE, Univ. of Tuebingen, Michigan, and Pittsburgh). 
URL http://computing.open.ac.uk/coda/data.html
 
Description Ongoing collaboration with NII Tokyo 
Organisation National Institute of Informatics (NII)
Country Japan 
Sector Public 
PI Contribution Publication of follow-up research at IVA conference. Research visit by Piwek February 16-25, 2015 to NII, funded by NII.
Collaborator Contribution Research collaboration and software development.
Impact Conference paper at IVA 2012 conference.
Start Year 2012
 
Description Papworth trust information: from leaflets to dialogue 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Participants in your research and patient groups
Results and Impact The Papworth Trust commissioned two videos based on their information leaflets for service users. The content of the videos is prepared using the CODA Monologue-to-Dialogue technology. The videos are available on the Papworth Trust's YouTube Channel: User Involvement Promise (see http://www.youtube.com/watch?v=Tb-MkbwAneY) and Feedback (see http://www.youtube.com/watch?v=A2jLwvJ_kE8) and on the trust's Getting involved page.
Year(s) Of Engagement Activity 2011
URL http://www.youtube.com/watch?v=A2jLwvJ_kE8