Annotating Reference and Coreference In Dialogue Using Conversational Agents in games

Lead Research Organisation: Queen Mary University of London
Department Name: Sch of Electronic Eng & Computer Science

Abstract

The development of modern neural network architectures architectures such as the encoder/decoder model and the Transformer has brought about an explosion of interest in neural models for AI systems able to engage in conversations (aka conversational agents), reflected by a spike of published work, dedicated workshops, and industry-sponsored competitions and grants. While at first these models were applied to simple chatbots, the focus of research has been shifting towards conversational agents capable of engaging in more complex and task-oriented dialogue such as restaurant booking or question answering. But the results on these tasks show that while end-to-end architectures without dedicated models for semantic interpretation can work well for chatbots, conversational agents carrying out more complex tasks require greater ablity to handle such aspects of interpretation, and some form of modelling of context.

Among the aspects of natural language interpretation that require more advanced architectures are COREFERENCE and REFERENCE. For an example of the importance of coreference in dialog, consider the following except from a real-life chat conversation, where both participants continually use anaphoric expressions such as BOTH, THEY, IT, etc to refer to previously introduced entities such as Google or Microsoft.

A:Are you a fan of Google or Microsoft?
B:Both are excellent technology they are helpful in many ways. For the security purpose both are super.
A:I'm not a huge fan of Google, but I use it a lot because I have to. I think they are a monopoly in some sense.
B:Google provides online related services and products, which includes search engine and cloud computing.
A:Yeah, their services are good. I'm just not a fan of intrusive they can be on our personal lives

Enriching conversational agents with the ability to carry out these forms of interpretation raises two issues. First, developing models for these tasks requires specific training data: most deep-learning architectures are trained on large amounts of freely available written text. Training a coreference resolver on written text and domain-adapting it to dialogue however has proven ineffective as coreference in dialogue involves different phenomena and is more involved than coreference in text. Second, the developed architectures require specific modules that enable them to interpret coreference and reference. Our group has pioneered the use of Games-With-A-Purpose (GWAPs) to collect data for NLP, resulting in the largest NLP dataset collected using GWAPs or indeed crowdsourcing. But there is a fundamental difference between conversation and written text: the latter is designed to be read by third parties, whereas research has shown that overhearers to a conversation only acquire a partial understanding of what was said.

OUR PROPOSED SOLUTION to the problem of creating large annotated datasets of coreference and reference interpretation in conversation is to collect the judgments for anaphoric and referential information via GAMES IN WHICH CONVERSATIONAL AGENTS INTERACT WITH HUMAN PLAYERS AND EVOLVE BY ACQUIRING INFORMATION FROM THEM. This idea builds on recent work by Facebook and Microsoft, among others, that pioneered the use of conversational agents in games to collect data about dialogue, and of Hockenmaier and her lab. Our agents will be deployed in gaming platforms such as LIGHT and MINECRAFT in collaboration with these labs. But whereas in previous work conversational agents only interact with the aim to improve their end-to-end behavior,
in the proposed project we will develop artificial agents able to improve their ability to interpret coreference and reference by collecting judgments about these interpretation aspects via CLARIFICATION QUESTIONS to the players at appropriate moments, which can also be used to annotate a dataset.
 
Title The CODI/CRAC 2022 Corpus of Anaphoric Reference in Dialogue 
Description This is the largest existing corpus of anaphoric reference in dialogue in English. 
Type Of Material Database/Collection of data 
Year Produced 2022 
Provided To Others? Yes  
Impact The corpus has just been made available but we expect it to become an important resource supporting research in this area. 
URL https://github.com/UniversalAnaphora
 
Title The Phrase Detectives Corpus 3.0 
Description Although several datasets annotated for anaphoric reference / coreference exist, even the largest such datasets have limitations in term of size, range of domains, coverage of anaphoric phenomena, and size of documents included. Yet, the approaches proposed to scale up anaphoric annotation haven't so far resulted in datasets overcoming these limitations. In this paper, we introduce a new release of a corpus for anaphoric reference labelled via a game-with-a-purpose. This new release is comparable in size to the largest existing corpora for anaphoric reference due in part to substantial activity by the players, in part thanks to the use of a new resolve-and-aggregate paradigm to 'complete' markable annotations through the combination of an anaphoric resolver and an aggregation method for anaphoric reference. The proposed method could be adopted to greatly speed up annotation time in other projects involving games-with-a-purpose. In addition, the corpus covers genres for which no comparable size datasets exist (Fiction and Wikipedia); it covers singletons and non-referring expressions; and it includes a substantial number of long documents (> 2K in length). 
Type Of Material Database/Collection of data 
Year Produced 2023 
Provided To Others? Yes  
Impact The corpus has just been released. 
URL https://github.com/dali-ambiguity/Phrase-Detectives-Corpus-3.0
 
Description CODI/CRAC Shared Task 2022 on Anaphora in Dialogue 
Organisation Carnegie Mellon University
Country United States 
Sector Academic/University 
PI Contribution The objective of the collaboration was to organize a shared task on anaphoric reference in dialogue. Our primary contributions were (i) running the task itself (setting up the Codalab site, etc) (ii) annotating the data in collaboration with our partners at CMU (Carolyn Rose and Lori Levin) and (iii) creating the scorer (UA scorer)
Collaborator Contribution CMU helped with the organization, in particular with the advertising, and collaborated with us on annotating the data, providing US$ 50,000 of funding HITS Heidelberg helped with the organization and provided part of the funding for the annotation UT Dallas helped with the organization and collaborated with us on the scorer Intel Labs created the Codalab site and helped running the competition
Impact Main outputs are (i) the CODI/CRAC dataset (ii) the LREC 2022 and CRAC 2022 publications listed in publications
Start Year 2022
 
Description CODI/CRAC Shared Task 2022 on Anaphora in Dialogue 
Organisation Heidelberg Institute for Theoretical Studies
Country Germany 
Sector Charity/Non Profit 
PI Contribution The objective of the collaboration was to organize a shared task on anaphoric reference in dialogue. Our primary contributions were (i) running the task itself (setting up the Codalab site, etc) (ii) annotating the data in collaboration with our partners at CMU (Carolyn Rose and Lori Levin) and (iii) creating the scorer (UA scorer)
Collaborator Contribution CMU helped with the organization, in particular with the advertising, and collaborated with us on annotating the data, providing US$ 50,000 of funding HITS Heidelberg helped with the organization and provided part of the funding for the annotation UT Dallas helped with the organization and collaborated with us on the scorer Intel Labs created the Codalab site and helped running the competition
Impact Main outputs are (i) the CODI/CRAC dataset (ii) the LREC 2022 and CRAC 2022 publications listed in publications
Start Year 2022
 
Description CODI/CRAC Shared Task 2022 on Anaphora in Dialogue 
Organisation Intel Corporation
Department INTEL Research
Country United States 
Sector Private 
PI Contribution The objective of the collaboration was to organize a shared task on anaphoric reference in dialogue. Our primary contributions were (i) running the task itself (setting up the Codalab site, etc) (ii) annotating the data in collaboration with our partners at CMU (Carolyn Rose and Lori Levin) and (iii) creating the scorer (UA scorer)
Collaborator Contribution CMU helped with the organization, in particular with the advertising, and collaborated with us on annotating the data, providing US$ 50,000 of funding HITS Heidelberg helped with the organization and provided part of the funding for the annotation UT Dallas helped with the organization and collaborated with us on the scorer Intel Labs created the Codalab site and helped running the competition
Impact Main outputs are (i) the CODI/CRAC dataset (ii) the LREC 2022 and CRAC 2022 publications listed in publications
Start Year 2022
 
Description CODI/CRAC Shared Task 2022 on Anaphora in Dialogue 
Organisation University of Texas at Dallas
Country United States 
Sector Academic/University 
PI Contribution The objective of the collaboration was to organize a shared task on anaphoric reference in dialogue. Our primary contributions were (i) running the task itself (setting up the Codalab site, etc) (ii) annotating the data in collaboration with our partners at CMU (Carolyn Rose and Lori Levin) and (iii) creating the scorer (UA scorer)
Collaborator Contribution CMU helped with the organization, in particular with the advertising, and collaborated with us on annotating the data, providing US$ 50,000 of funding HITS Heidelberg helped with the organization and provided part of the funding for the annotation UT Dallas helped with the organization and collaborated with us on the scorer Intel Labs created the Codalab site and helped running the competition
Impact Main outputs are (i) the CODI/CRAC dataset (ii) the LREC 2022 and CRAC 2022 publications listed in publications
Start Year 2022
 
Description Minecraft Dialogue Corpus Annotation 
Organisation Linagora
Country France 
Sector Private 
PI Contribution The objective of this collaboration is to annotate the Minecraft Dialogue Corpus, created by Julia Hockenmaier's lab at the University of Illinois Urbana Champaign, to support the development of conversational agents. Our team is carrying out the annotation for coreference and reference, and developing the models.
Collaborator Contribution University of Illinois provided the original corpus. Gothenburg University the tool we are using for coreference annotation. University at Colorado Boulder is carrying out the AMR annotation. LinAGORA is carrying out the discourse structure annotation.
Impact The primary output of this will be the annotated corpus; completion is expected this year.
Start Year 2022
 
Description Minecraft Dialogue Corpus Annotation 
Organisation University of Colorado Boulder
Country United States 
Sector Academic/University 
PI Contribution The objective of this collaboration is to annotate the Minecraft Dialogue Corpus, created by Julia Hockenmaier's lab at the University of Illinois Urbana Champaign, to support the development of conversational agents. Our team is carrying out the annotation for coreference and reference, and developing the models.
Collaborator Contribution University of Illinois provided the original corpus. Gothenburg University the tool we are using for coreference annotation. University at Colorado Boulder is carrying out the AMR annotation. LinAGORA is carrying out the discourse structure annotation.
Impact The primary output of this will be the annotated corpus; completion is expected this year.
Start Year 2022
 
Description Minecraft Dialogue Corpus Annotation 
Organisation University of Gothenburg
Country Sweden 
Sector Academic/University 
PI Contribution The objective of this collaboration is to annotate the Minecraft Dialogue Corpus, created by Julia Hockenmaier's lab at the University of Illinois Urbana Champaign, to support the development of conversational agents. Our team is carrying out the annotation for coreference and reference, and developing the models.
Collaborator Contribution University of Illinois provided the original corpus. Gothenburg University the tool we are using for coreference annotation. University at Colorado Boulder is carrying out the AMR annotation. LinAGORA is carrying out the discourse structure annotation.
Impact The primary output of this will be the annotated corpus; completion is expected this year.
Start Year 2022
 
Description Minecraft Dialogue Corpus Annotation 
Organisation University of Illinois at Urbana-Champaign
Department School of Information Sciences
Country United States 
Sector Academic/University 
PI Contribution The objective of this collaboration is to annotate the Minecraft Dialogue Corpus, created by Julia Hockenmaier's lab at the University of Illinois Urbana Champaign, to support the development of conversational agents. Our team is carrying out the annotation for coreference and reference, and developing the models.
Collaborator Contribution University of Illinois provided the original corpus. Gothenburg University the tool we are using for coreference annotation. University at Colorado Boulder is carrying out the AMR annotation. LinAGORA is carrying out the discourse structure annotation.
Impact The primary output of this will be the annotated corpus; completion is expected this year.
Start Year 2022
 
Description Universal Anaphora Scorer 
Organisation Charles University
Country Czech Republic 
Sector Academic/University 
PI Contribution The Universal Anaphora Scorer is a collaboration between groups in Europe and the USA to develop a new reference scorer covering not just coreference but other types of anaphora as well. We are the main developers of the code but we are building on previous scorers and collaborating with several groups to adapt it to different purposes.
Collaborator Contribution Sameer Pradhan from LDC developed the previous official coreference scorer and have been testing our version Nafise Moosavi from Sheffield developed the first Python version of the scorer and collaborated with us on the extension Michal Novak from Charles University extended our first scorer and collaborated with us to merge the two versions Amir Zeldes from Georgetown University contributed to the design of the markup format
Impact Primary output is the software itself available from Github A first paper on the scorer was presented at LREC 2022
Start Year 2022
 
Description Universal Anaphora Scorer 
Organisation Georgetown University
Country United States 
Sector Academic/University 
PI Contribution The Universal Anaphora Scorer is a collaboration between groups in Europe and the USA to develop a new reference scorer covering not just coreference but other types of anaphora as well. We are the main developers of the code but we are building on previous scorers and collaborating with several groups to adapt it to different purposes.
Collaborator Contribution Sameer Pradhan from LDC developed the previous official coreference scorer and have been testing our version Nafise Moosavi from Sheffield developed the first Python version of the scorer and collaborated with us on the extension Michal Novak from Charles University extended our first scorer and collaborated with us to merge the two versions Amir Zeldes from Georgetown University contributed to the design of the markup format
Impact Primary output is the software itself available from Github A first paper on the scorer was presented at LREC 2022
Start Year 2022
 
Description Universal Anaphora Scorer 
Organisation University of Sheffield
Country United Kingdom 
Sector Academic/University 
PI Contribution The Universal Anaphora Scorer is a collaboration between groups in Europe and the USA to develop a new reference scorer covering not just coreference but other types of anaphora as well. We are the main developers of the code but we are building on previous scorers and collaborating with several groups to adapt it to different purposes.
Collaborator Contribution Sameer Pradhan from LDC developed the previous official coreference scorer and have been testing our version Nafise Moosavi from Sheffield developed the first Python version of the scorer and collaborated with us on the extension Michal Novak from Charles University extended our first scorer and collaborated with us to merge the two versions Amir Zeldes from Georgetown University contributed to the design of the markup format
Impact Primary output is the software itself available from Github A first paper on the scorer was presented at LREC 2022
Start Year 2022
 
Description CODI/CRAC Shared Task on Anaphora in Dialogue 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Research in NLP is largely driven by the organization of shared tasks. In order to promote research on coreference in dialogue, we collaborated with a number of partners (see the Collaborations tab) to organize a shared task on Anaphora Reference in Dialogue.
Year(s) Of Engagement Activity 2022
URL https://aclanthology.org/volumes/2022.codi-crac/