CODA: COherent Dialogue Automatically generated from text
Lead Research Organisation:
The Open University
Department Name: Computing
Abstract
The CODA (COherent Dialogue Automatically generated from text) project will make a contribution to realizing the UK Government's Council for Science and Technology's vision of `providing people with services and information when, where and how they need it [...] Interaction will be through next generation personal digital assistants [...] and doubtlessly a variety of other human-oriented methods as yet unforeseen'. CODA will help achieve this by developing the theory and technology for automatically creating dialogue content from text in monologue form. There is ample empirical evidence that presentation of information in the form of a dialogue can be more effective than monologue in certain settings (e.g., tutoring and persuasive communication). Since most information is, however, locked up in text (books, leaflets, webpages, etc.), text-to-dialogue generation technology can play an important role in making information available in a form that best meets people's needs for easily processible and engaging information. The effectiveness of dialogue is magnified by the fact that it is eminently suitable for new multimedia presentation styles - e.g., a dialogue can be performed by digital computer-animated characters. Thus, presentation of information in dialogue form promises to not only deliver effective information presentation, but also entertain and engage people, as evidenced by the widespread use of dialogue in conventional media, such as news bulletins, commercials, educational entertainment and games. The proposed research builds on a preliminary feasibility study that was undertaken in collaboration with Dr. Prendinger at the National Institute of Informatics (Tokyo). That research led to a first prototype that takes a patient information leaflet with text such as: You can use aspirin, if you have a headache. Though aspirin does have side effects: it can harm the circulation. , and automatically generates a dialogue between a virtual pharmacist and client: C: What if I have a headache? P: You can use aspirin. C: But does it have side effects? P: Yes, it can harm the circulation. . Dr. Prendinger is proposed as Visiting Researcher for the current project.The project will develop the aforementioned first prototype into a domain-independent system for the generation of dialogue from text such that the meaning of the input text is preserved and the resulting dialogue is both coherent and cohesive. It will also produce the, to our knowledge, first extensive collection of text spans paired with snippets of dialogue that are equivalent in meaning (a parallel text-dialogue corpus). This corpus will be used in the project to learn transformations from text to dialogue that the system will then implement. During the second half of the project, a thorough evaluation of the system will take place to determine the quality of the content and organization of the generated dialogues. It will be applied to input texts from a variety domains to put its robustness/domain-independence to the test. We anticipate that if successful, this project will lead to potentially commercially exploitable middle-ware for bridging the gap between content locked up in text and effective and engaging presentations of information through state-of-the-art multimedia presentation tools, with applications in education (presentation of textbook materials), E-health (presenting medical information in an engaging way), and serious/educational games (automatic generation of dialogue content for non-player characters).
People |
ORCID iD |
Paul Piwek (Principal Investigator) |
Publications
Kuyten P
(2012)
Intelligent Virtual Agents
Piwek P
(2012)
Varieties of Question Generation: Introduction to this Special Issue: Introduction to this Special Issue
in Dialogue & Discourse
Piwek P
(2017)
Dialogue across Media
Piwek P
(2011)
Data-oriented Monologue-to-Dialogue Generation
in Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: shortpapers
Piwek, P
(2010)
Generating Expository Dialogue from Monologue:Motivation, Corpus and Preliminary Rules
in 11th Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL) 2010
Rus V
(2012)
A Detailed Account of The First Question Generation Shared Task Evaluation Challenge
in Dialogue & Discourse
Stoyanchev S
(2010)
Constructing the CODA Corpus: A Parallel Corpus of Monologues and Expository Dialogues
in 7th international conference on Language Resources and Evaluation (LREC) 2010
Stoyanchev S
(2011)
The CODA System for Monologue-to-Dialogue Generation
in Proceedings of the SIGDIAL 2011: the 12th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Stoyanchev S
(2011)
Intelligent Virtual Agents
Stoyanchev S
(2010)
Harvesting re-usable high-level rules for expository dialogue generation
in 6th International Natural Language Generation Conference (INLG 2010)
Description | The main aim of the CODA project was to develop the theory and technology for automatically transforming text into coherent and cohesive expository -- i.e., information delivering -- dialogue, such that the transformation from text to dialogue preserves the informational content of the input text. This aim involved four equally important objectives: 1) Creation of the, to our knowledge, first parallel corpus of text-dialogue pairs; 2) Formulation of transformation rules based on the corpus which relate certain patterns in text in monologue form (in particular, underlying coherence relations) to patterns in dialogue (specific dialogue acts, moves or structures); 3) Implementation of the transformations in a system for text-to-dialogue generation; 4) Evaluation of content and organization of the dialogues that the system generates. |
Exploitation Route | The Papworth Trust commissioned two videos based on their information leaflets for service users. The content of the videos was prepared using the CODA Monologue-to-Dialogue technology. The videos are available on the Papworth Trust's Homepage and their YouTube Channel: http://www.youtube.com/user/papworthtrust#p/u/0/Tb-MkbwAneY and http://www.youtube.com/user/papworthtrust#p/u/1/A2jLwvJ_kE8. Bored with reading the instructions? Watch and listen instead! Information leaflets may soon be more than just a piece of paper. Research at The Open University has made significant steps towards changing the way information is conveyed by translating text into dialogue, while preserving clarity and meaning. The EPSRC-funded project, Coherent Dialogue Automatically Generated from Text (CODA), has developed the theory and technology for semi-automatic conversion of text to dialogue. It is specifically geared towards creating conversations between lay people and experts, for example, a patient and doctor. Despite evidence supporting the fact that dialogue is more effective than monologue in tutoring and persuasive communication, most information is still locked up in text, including books, leaflets or web pages. Text-to-dialogue generation technology can play an important role in making information available in a form that best meets people's needs when processing information. Dialogue is much more suitable for new multimedia presentation styles and can, for example, be performed by digital computer-animated characters. |
Sectors | Digital/Communication/Information Technologies (including Software) |
URL | http://computing.open.ac.uk/coda/ |
Description | The dialogue from text generation technology was used by the Papworth trust to create videos for two of their information leaflets. These videos were used on their website to inform service users. See http://computing.open.ac.uk/coda/publicity.html |
First Year Of Impact | 2011 |
Sector | Communities and Social Services/Policy,Healthcare |
Impact Types | Societal |
Description | Studentship funding directly by employer (WDS/Xerox) of student |
Amount | £12,000 (GBP) |
Funding ID | N/A |
Organisation | Xerox Corporation |
Department | Xerox Europe |
Sector | Private |
Country | United Kingdom |
Start | 06/2015 |
End | 06/2021 |
Title | CODA Tools |
Description | CODA Tools software Release 1.1 February 20, 2012. This release contains 1) software for converting text parsed with RST relations into dialogue and 2) an anotation tool for annotating dialogue and translating it into monologue (used for creating CODA corpus). |
Type Of Material | Computer model/algorithm |
Year Produced | 2012 |
Provided To Others? | Yes |
Impact | Downloaded by researchers at several research establishments, including Takamura lab at Tokyo Institute of Technology, Hankuk University of Foreign Studies (South Korea), Cornell and Columbia Universities. |
URL | http://computing.open.ac.uk/coda/data.html |
Title | CODA corpus |
Description | CODA corpus Release 1.0 July 16, 2010. This release contains approximately 700 turns of human-authored expository dialogue (by Mark Twain and George Berkeley) which has been aligned with monologue that expresses the same information as the dialogue. The monologue side is annotated with Coherence Relations (RST), and the dialogue side with Dialogue Act tags. |
Type Of Material | Database/Collection of data |
Year Produced | 2010 |
Provided To Others? | Yes |
Impact | The monologue-to-dialogue corpus has been released to the research community (100 downloads, including Carnegie Mellon and Xerox). |
URL | http://computing.open.ac.uk/coda/data.html |
Title | QG Corpus |
Description | QGSTEC 2010 Generating Questions from Sentences Corpus December 21, 2010. A corpus of over 1000 questions (both human and machine generated). The automatically generated questions have been rated by several raters according to five criteria (relevance, question type, syntactic correctness and fluency, ambiguity, and variety). |
Type Of Material | Database/Collection of data |
Year Produced | 2010 |
Provided To Others? | Yes |
Impact | The QGSTEC development, evaluation and results data were made available to the research community (54 downloads so far from groups including IBM, MITRE, Univ. of Tuebingen, Michigan, and Pittsburgh). |
URL | http://computing.open.ac.uk/coda/data.html |
Description | Ongoing collaboration with NII Tokyo |
Organisation | National Institute of Informatics (NII) |
Country | Japan |
Sector | Public |
PI Contribution | Publication of follow-up research at IVA conference. Research visit by Piwek February 16-25, 2015 to NII, funded by NII. |
Collaborator Contribution | Research collaboration and software development. |
Impact | Conference paper at IVA 2012 conference. |
Start Year | 2012 |
Description | Papworth trust information: from leaflets to dialogue |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Participants in your research and patient groups |
Results and Impact | The Papworth Trust commissioned two videos based on their information leaflets for service users. The content of the videos is prepared using the CODA Monologue-to-Dialogue technology. The videos are available on the Papworth Trust's YouTube Channel: User Involvement Promise (see http://www.youtube.com/watch?v=Tb-MkbwAneY) and Feedback (see http://www.youtube.com/watch?v=A2jLwvJ_kE8) and on the trust's Getting involved page. |
Year(s) Of Engagement Activity | 2011 |
URL | http://www.youtube.com/watch?v=A2jLwvJ_kE8 |