Annotating Reference and Coreference In Dialogue Using Conversational Agents in games

Lead Research Organisation: Queen Mary University of London
Department Name: Sch of Electronic Eng & Computer Science

Abstract

The development of modern neural network architectures architectures such as the encoder/decoder model and the Transformer has brought about an explosion of interest in neural models for AI systems able to engage in conversations (aka conversational agents), reflected by a spike of published work, dedicated workshops, and industry-sponsored competitions and grants. While at first these models were applied to simple chatbots, the focus of research has been shifting towards conversational agents capable of engaging in more complex and task-oriented dialogue such as restaurant booking or question answering. But the results on these tasks show that while end-to-end architectures without dedicated models for semantic interpretation can work well for chatbots, conversational agents carrying out more complex tasks require greater ablity to handle such aspects of interpretation, and some form of modelling of context.

Among the aspects of natural language interpretation that require more advanced architectures are COREFERENCE and REFERENCE. For an example of the importance of coreference in dialog, consider the following except from a real-life chat conversation, where both participants continually use anaphoric expressions such as BOTH, THEY, IT, etc to refer to previously introduced entities such as Google or Microsoft.

A:Are you a fan of Google or Microsoft?
B:Both are excellent technology they are helpful in many ways. For the security purpose both are super.
A:I'm not a huge fan of Google, but I use it a lot because I have to. I think they are a monopoly in some sense.
B:Google provides online related services and products, which includes search engine and cloud computing.
A:Yeah, their services are good. I'm just not a fan of intrusive they can be on our personal lives

Enriching conversational agents with the ability to carry out these forms of interpretation raises two issues. First, developing models for these tasks requires specific training data: most deep-learning architectures are trained on large amounts of freely available written text. Training a coreference resolver on written text and domain-adapting it to dialogue however has proven ineffective as coreference in dialogue involves different phenomena and is more involved than coreference in text. Second, the developed architectures require specific modules that enable them to interpret coreference and reference. Our group has pioneered the use of Games-With-A-Purpose (GWAPs) to collect data for NLP, resulting in the largest NLP dataset collected using GWAPs or indeed crowdsourcing. But there is a fundamental difference between conversation and written text: the latter is designed to be read by third parties, whereas research has shown that overhearers to a conversation only acquire a partial understanding of what was said.

OUR PROPOSED SOLUTION to the problem of creating large annotated datasets of coreference and reference interpretation in conversation is to collect the judgments for anaphoric and referential information via GAMES IN WHICH CONVERSATIONAL AGENTS INTERACT WITH HUMAN PLAYERS AND EVOLVE BY ACQUIRING INFORMATION FROM THEM. This idea builds on recent work by Facebook and Microsoft, among others, that pioneered the use of conversational agents in games to collect data about dialogue, and of Hockenmaier and her lab. Our agents will be deployed in gaming platforms such as LIGHT and MINECRAFT in collaboration with these labs. But whereas in previous work conversational agents only interact with the aim to improve their end-to-end behavior,
in the proposed project we will develop artificial agents able to improve their ability to interpret coreference and reference by collecting judgments about these interpretation aspects via CLARIFICATION QUESTIONS to the players at appropriate moments, which can also be used to annotate a dataset.
 
Description One of the objectives of the project was to address the lack of datasets for studying referential interpretation in dialogue. The first achievement of the project was the creation of a number of datasets addressing this limitation. In the first year of the project, the CODI/CRAC corpus was released, created in collaboration with CMU in the US and ITS Heidelberg in Germany, and used for a joint shared task between the main ACL conferences on discourse and anaphora. In the second year, the Minecraft Dialogue Corpus of conversations with agents in the Minecraft world, created by project partners at the University of Illinois, was annotated for coreference, while other partners at the University of Toulouse and at the University of Colorado at Boulder annotated it for discourse structure and semantics, respectively.
The second achievement of the project was the deployment of a number of platforms powered by Large Language Models (LLMs), in which players can interact with conversational agents in games. A paper on this topic was submitted to ACL, and a second will be submitted to the WordPlay workshop.
The main findings so far related to the extent to which LLMs can interpret coreference. A paper on this topic was accepted by LREC-COLING.
Exploitation Route One outcome of the project that others can immediately use - in fact, have already used - is the datasets created, which have been made freely available through the Universal Anaphora GitHub and elsewhere.
The conversational agents embedded in games we have developed using LLMs can be used not only to explore dialogue, but also in settings such as education and haealth.
Sectors Creative Economy

Digital/Communication/Information Technologies (including Software)

Education

Healthcare

Leisure Activities

including Sports

Recreation and Tourism

 
Description Much of this project's research, and in particular the dataset creation, has been carried out in collaboration with non-academic organizations, such as Meta for the Light dataset, and Microsoft for the Minecraft and IGLU datasets. These datasets have been made freely available to these organizations and other organizations in industry and elsewhere, which are using them as benchmarks. The conversational agent research is designed to produce systems that can be used in an applicative setting. We're in discussions with companies such as Toshiba to this end, and we're about to submit a proposal to Amazon who's interested in work on clarification requests in dialogue. We are also aiming to leverage the CA technology in an educational and health setting, in collaboration with NHS - e.g., to collect data that can be used to assess a teenager's mental health. A first proposal in this direction was submitted last year to the 'Transformative Health' call, but it was unsuccessful; we intend to submit a new one this year.
First Year Of Impact 2022
Sector Creative Economy,Digital/Communication/Information Technologies (including Software)
Impact Types Societal

Economic

 
Title The ARRAU 3.0 Corpus of Anaphoric Reference 
Description ARRAU is a corpus annotated with anaphoric reference. The third release of ARRAU corrects a number of issues with the previous versions, in particular relevant to anaphoric reference in dialogue. 
Type Of Material Database/Collection of data 
Year Produced 2024 
Provided To Others? Yes  
Impact ARRAU is one of the most widely used datasets for anaphoric reference. Preliminary versions of this third releases were used as training data for CODI/CRAC 2022. 
URL https://sites.google.com/view/arrau/corpus
 
Title The CODI/CRAC 2022 Corpus of Anaphoric Reference in Dialogue 
Description This is the largest existing corpus of anaphoric reference in dialogue in English. 
Type Of Material Database/Collection of data 
Year Produced 2022 
Provided To Others? Yes  
Impact The corpus has just been made available but we expect it to become an important resource supporting research in this area. 
URL https://github.com/UniversalAnaphora
 
Title The Phrase Detectives Corpus 3.0 
Description Although several datasets annotated for anaphoric reference / coreference exist, even the largest such datasets have limitations in term of size, range of domains, coverage of anaphoric phenomena, and size of documents included. Yet, the approaches proposed to scale up anaphoric annotation haven't so far resulted in datasets overcoming these limitations. In this paper, we introduce a new release of a corpus for anaphoric reference labelled via a game-with-a-purpose. This new release is comparable in size to the largest existing corpora for anaphoric reference due in part to substantial activity by the players, in part thanks to the use of a new resolve-and-aggregate paradigm to 'complete' markable annotations through the combination of an anaphoric resolver and an aggregation method for anaphoric reference. The proposed method could be adopted to greatly speed up annotation time in other projects involving games-with-a-purpose. In addition, the corpus covers genres for which no comparable size datasets exist (Fiction and Wikipedia); it covers singletons and non-referring expressions; and it includes a substantial number of long documents (> 2K in length). 
Type Of Material Database/Collection of data 
Year Produced 2023 
Provided To Others? Yes  
Impact The corpus has just been released. 
URL https://github.com/dali-ambiguity/Phrase-Detectives-Corpus-3.0
 
Description CODI/CRAC Shared Task 2022 on Anaphora in Dialogue 
Organisation Carnegie Mellon University
Country United States 
Sector Academic/University 
PI Contribution The objective of the collaboration was to organize a shared task on anaphoric reference in dialogue. Our primary contributions were (i) running the task itself (setting up the Codalab site, etc) (ii) annotating the data in collaboration with our partners at CMU (Carolyn Rose and Lori Levin) and (iii) creating the scorer (UA scorer)
Collaborator Contribution CMU helped with the organization, in particular with the advertising, and collaborated with us on annotating the data, providing US$ 50,000 of funding HITS Heidelberg helped with the organization and provided part of the funding for the annotation UT Dallas helped with the organization and collaborated with us on the scorer Intel Labs created the Codalab site and helped running the competition
Impact Main outputs are (i) the CODI/CRAC dataset (ii) the LREC 2022 and CRAC 2022 publications listed in publications
Start Year 2022
 
Description CODI/CRAC Shared Task 2022 on Anaphora in Dialogue 
Organisation Heidelberg Institute for Theoretical Studies
Country Germany 
Sector Charity/Non Profit 
PI Contribution The objective of the collaboration was to organize a shared task on anaphoric reference in dialogue. Our primary contributions were (i) running the task itself (setting up the Codalab site, etc) (ii) annotating the data in collaboration with our partners at CMU (Carolyn Rose and Lori Levin) and (iii) creating the scorer (UA scorer)
Collaborator Contribution CMU helped with the organization, in particular with the advertising, and collaborated with us on annotating the data, providing US$ 50,000 of funding HITS Heidelberg helped with the organization and provided part of the funding for the annotation UT Dallas helped with the organization and collaborated with us on the scorer Intel Labs created the Codalab site and helped running the competition
Impact Main outputs are (i) the CODI/CRAC dataset (ii) the LREC 2022 and CRAC 2022 publications listed in publications
Start Year 2022
 
Description CODI/CRAC Shared Task 2022 on Anaphora in Dialogue 
Organisation Intel Corporation
Department INTEL Research
Country United States 
Sector Private 
PI Contribution The objective of the collaboration was to organize a shared task on anaphoric reference in dialogue. Our primary contributions were (i) running the task itself (setting up the Codalab site, etc) (ii) annotating the data in collaboration with our partners at CMU (Carolyn Rose and Lori Levin) and (iii) creating the scorer (UA scorer)
Collaborator Contribution CMU helped with the organization, in particular with the advertising, and collaborated with us on annotating the data, providing US$ 50,000 of funding HITS Heidelberg helped with the organization and provided part of the funding for the annotation UT Dallas helped with the organization and collaborated with us on the scorer Intel Labs created the Codalab site and helped running the competition
Impact Main outputs are (i) the CODI/CRAC dataset (ii) the LREC 2022 and CRAC 2022 publications listed in publications
Start Year 2022
 
Description CODI/CRAC Shared Task 2022 on Anaphora in Dialogue 
Organisation University of Texas at Dallas
Country United States 
Sector Academic/University 
PI Contribution The objective of the collaboration was to organize a shared task on anaphoric reference in dialogue. Our primary contributions were (i) running the task itself (setting up the Codalab site, etc) (ii) annotating the data in collaboration with our partners at CMU (Carolyn Rose and Lori Levin) and (iii) creating the scorer (UA scorer)
Collaborator Contribution CMU helped with the organization, in particular with the advertising, and collaborated with us on annotating the data, providing US$ 50,000 of funding HITS Heidelberg helped with the organization and provided part of the funding for the annotation UT Dallas helped with the organization and collaborated with us on the scorer Intel Labs created the Codalab site and helped running the competition
Impact Main outputs are (i) the CODI/CRAC dataset (ii) the LREC 2022 and CRAC 2022 publications listed in publications
Start Year 2022
 
Description Minecraft Dialogue Corpus Annotation 
Organisation Linagora
Country France 
Sector Private 
PI Contribution The objective of this collaboration is to annotate the Minecraft Dialogue Corpus, created by Julia Hockenmaier's lab at the University of Illinois Urbana Champaign, to support the development of conversational agents. Our team is carrying out the annotation for coreference and reference, and developing the models.
Collaborator Contribution University of Illinois provided the original corpus. Gothenburg University the tool we are using for coreference annotation. University at Colorado Boulder is carrying out the AMR annotation. LinAGORA is carrying out the discourse structure annotation.
Impact The primary output of this will be the annotated corpus; completion is expected this year.
Start Year 2022
 
Description Minecraft Dialogue Corpus Annotation 
Organisation University of Colorado Boulder
Country United States 
Sector Academic/University 
PI Contribution The objective of this collaboration is to annotate the Minecraft Dialogue Corpus, created by Julia Hockenmaier's lab at the University of Illinois Urbana Champaign, to support the development of conversational agents. Our team is carrying out the annotation for coreference and reference, and developing the models.
Collaborator Contribution University of Illinois provided the original corpus. Gothenburg University the tool we are using for coreference annotation. University at Colorado Boulder is carrying out the AMR annotation. LinAGORA is carrying out the discourse structure annotation.
Impact The primary output of this will be the annotated corpus; completion is expected this year.
Start Year 2022
 
Description Minecraft Dialogue Corpus Annotation 
Organisation University of Gothenburg
Country Sweden 
Sector Academic/University 
PI Contribution The objective of this collaboration is to annotate the Minecraft Dialogue Corpus, created by Julia Hockenmaier's lab at the University of Illinois Urbana Champaign, to support the development of conversational agents. Our team is carrying out the annotation for coreference and reference, and developing the models.
Collaborator Contribution University of Illinois provided the original corpus. Gothenburg University the tool we are using for coreference annotation. University at Colorado Boulder is carrying out the AMR annotation. LinAGORA is carrying out the discourse structure annotation.
Impact The primary output of this will be the annotated corpus; completion is expected this year.
Start Year 2022
 
Description Minecraft Dialogue Corpus Annotation 
Organisation University of Illinois at Urbana-Champaign
Department School of Information Sciences
Country United States 
Sector Academic/University 
PI Contribution The objective of this collaboration is to annotate the Minecraft Dialogue Corpus, created by Julia Hockenmaier's lab at the University of Illinois Urbana Champaign, to support the development of conversational agents. Our team is carrying out the annotation for coreference and reference, and developing the models.
Collaborator Contribution University of Illinois provided the original corpus. Gothenburg University the tool we are using for coreference annotation. University at Colorado Boulder is carrying out the AMR annotation. LinAGORA is carrying out the discourse structure annotation.
Impact The primary output of this will be the annotated corpus; completion is expected this year.
Start Year 2022
 
Description Universal Anaphora Scorer 
Organisation Charles University
Country Czech Republic 
Sector Academic/University 
PI Contribution The Universal Anaphora Scorer is a collaboration between groups in Europe and the USA to develop a new reference scorer covering not just coreference but other types of anaphora as well. We are the main developers of the code but we are building on previous scorers and collaborating with several groups to adapt it to different purposes.
Collaborator Contribution Sameer Pradhan from LDC developed the previous official coreference scorer and have been testing our version Nafise Moosavi from Sheffield developed the first Python version of the scorer and collaborated with us on the extension Michal Novak from Charles University extended our first scorer and collaborated with us to merge the two versions Amir Zeldes from Georgetown University contributed to the design of the markup format
Impact Primary output is the software itself available from Github A first paper on the scorer was presented at LREC 2022
Start Year 2022
 
Description Universal Anaphora Scorer 
Organisation Georgetown University
Country United States 
Sector Academic/University 
PI Contribution The Universal Anaphora Scorer is a collaboration between groups in Europe and the USA to develop a new reference scorer covering not just coreference but other types of anaphora as well. We are the main developers of the code but we are building on previous scorers and collaborating with several groups to adapt it to different purposes.
Collaborator Contribution Sameer Pradhan from LDC developed the previous official coreference scorer and have been testing our version Nafise Moosavi from Sheffield developed the first Python version of the scorer and collaborated with us on the extension Michal Novak from Charles University extended our first scorer and collaborated with us to merge the two versions Amir Zeldes from Georgetown University contributed to the design of the markup format
Impact Primary output is the software itself available from Github A first paper on the scorer was presented at LREC 2022
Start Year 2022
 
Description Universal Anaphora Scorer 
Organisation University of Sheffield
Country United Kingdom 
Sector Academic/University 
PI Contribution The Universal Anaphora Scorer is a collaboration between groups in Europe and the USA to develop a new reference scorer covering not just coreference but other types of anaphora as well. We are the main developers of the code but we are building on previous scorers and collaborating with several groups to adapt it to different purposes.
Collaborator Contribution Sameer Pradhan from LDC developed the previous official coreference scorer and have been testing our version Nafise Moosavi from Sheffield developed the first Python version of the scorer and collaborated with us on the extension Michal Novak from Charles University extended our first scorer and collaborated with us to merge the two versions Amir Zeldes from Georgetown University contributed to the design of the markup format
Impact Primary output is the software itself available from Github A first paper on the scorer was presented at LREC 2022
Start Year 2022
 
Description CODI/CRAC Shared Task on Anaphora in Dialogue 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Research in NLP is largely driven by the organization of shared tasks. In order to promote research on coreference in dialogue, we collaborated with a number of partners (see the Collaborations tab) to organize a shared task on Anaphora Reference in Dialogue.
Year(s) Of Engagement Activity 2022
URL https://aclanthology.org/volumes/2022.codi-crac/