Annotating Reference and Coreference In Dialogue Using Conversational Agents in games

Lead Research Organisation: Queen Mary University of London

Department Name: Sch of Electronic Eng & Computer Science

Abstract

The development of modern neural network architectures architectures such as the encoder/decoder model and the Transformer has brought about an explosion of interest in neural models for AI systems able to engage in conversations (aka conversational agents), reflected by a spike of published work, dedicated workshops, and industry-sponsored competitions and grants. While at first these models were applied to simple chatbots, the focus of research has been shifting towards conversational agents capable of engaging in more complex and task-oriented dialogue such as restaurant booking or question answering. But the results on these tasks show that while end-to-end architectures without dedicated models for semantic interpretation can work well for chatbots, conversational agents carrying out more complex tasks require greater ablity to handle such aspects of interpretation, and some form of modelling of context.

Among the aspects of natural language interpretation that require more advanced architectures are COREFERENCE and REFERENCE. For an example of the importance of coreference in dialog, consider the following except from a real-life chat conversation, where both participants continually use anaphoric expressions such as BOTH, THEY, IT, etc to refer to previously introduced entities such as Google or Microsoft.

A:Are you a fan of Google or Microsoft?
B:Both are excellent technology they are helpful in many ways. For the security purpose both are super.
A:I'm not a huge fan of Google, but I use it a lot because I have to. I think they are a monopoly in some sense.
B:Google provides online related services and products, which includes search engine and cloud computing.
A:Yeah, their services are good. I'm just not a fan of intrusive they can be on our personal lives

Enriching conversational agents with the ability to carry out these forms of interpretation raises two issues. First, developing models for these tasks requires specific training data: most deep-learning architectures are trained on large amounts of freely available written text. Training a coreference resolver on written text and domain-adapting it to dialogue however has proven ineffective as coreference in dialogue involves different phenomena and is more involved than coreference in text. Second, the developed architectures require specific modules that enable them to interpret coreference and reference. Our group has pioneered the use of Games-With-A-Purpose (GWAPs) to collect data for NLP, resulting in the largest NLP dataset collected using GWAPs or indeed crowdsourcing. But there is a fundamental difference between conversation and written text: the latter is designed to be read by third parties, whereas research has shown that overhearers to a conversation only acquire a partial understanding of what was said.

OUR PROPOSED SOLUTION to the problem of creating large annotated datasets of coreference and reference interpretation in conversation is to collect the judgments for anaphoric and referential information via GAMES IN WHICH CONVERSATIONAL AGENTS INTERACT WITH HUMAN PLAYERS AND EVOLVE BY ACQUIRING INFORMATION FROM THEM. This idea builds on recent work by Facebook and Microsoft, among others, that pioneered the use of conversational agents in games to collect data about dialogue, and of Hockenmaier and her lab. Our agents will be deployed in gaming platforms such as LIGHT and MINECRAFT in collaboration with these labs. But whereas in previous work conversational agents only interact with the aim to improve their end-to-end behavior,
in the proposed project we will develop artificial agents able to improve their ability to interpret coreference and reference by collecting judgments about these interpretation aspects via CLARIFICATION QUESTIONS to the players at appropriate moments, which can also be used to annotate a dataset.

Funded Value:

£1,091,328

Funded Period:

Feb 22 - Jan 25

Funder:

EPSRC

Project Status:

Active

Project Category:

Research Grant

Project Reference:

EP/W001632/1

Principal Investigator:

Massimo Poesio

Research Subject:

Info. & commun. Technol. (70%)

Linguistics (30%)

Research Topic:

Artificial Intelligence (40%)

Computational Linguistics (30%)

Human-Computer Interactions (10%)

Information & Knowledge Mgmt (20%)

Organisations

People	ORCID iD
Massimo Poesio (Principal Investigator)
Richard Bartle (Co-Investigator)
Jon Chamberlain (Co-Investigator)
Matthew Purver (Co-Investigator)
Diego Perez Liebana (Co-Investigator)
Julian Hough (Co-Investigator)
Juntao Yu (Co-Investigator)	http://orcid.org/0000-0001-7971-9154

Publications

Author Name

Title Publication Date Published

|< < 1 2 > >|

10 25 50

Aliady W (2022) Coreference Annotation of an Arabic Corpus using a Virtual World Game

Gan Y (2024) Assessing the Capabilities of Large Language Models in Coreference: An Evaluation

Hough J (2024) Conceptual Pacts for Reference Resolution using Small, Dynamically Constructed Language Models: A Study in Puzzle Building Dialogues

Madge C (2022) LingoTowns: A Virtual World For Natural Language Annotation and Language Learning

Ogrodniczuk M (2023) Proceedings of The Sixth Workshop on Computational Models of Reference, Anaphora and Coreference (CRAC 2023)

Paun S (2023) Scoring Coreference Chains with Split-Antecedent Anaphors in Dialogue & Discourse

Poesio M (2023) Computational Models of Anaphora in Annual Review of Linguistics

Poesio M (2024) Universal Anaphora: The First Three Years

Poesio M (2024) The ARRAU 3.0 Corpus

Poesio, M (2022) ARCIDUCA: Annotating Reference and Coreference In Dialogue Using Conversational Agents in games

Key Findings
Impact Summary
Research Databases and Models
Collaboration
Engagement Activities


Description	One of the objectives of the project was to address the lack of datasets for studying referential interpretation in dialogue. The first achievement of the project was the creation of a number of datasets addressing this limitation. In the first year of the project, the CODI/CRAC corpus was released, created in collaboration with CMU in the US and ITS Heidelberg in Germany, and used for a joint shared task between the main ACL conferences on discourse and anaphora. In the second year, the Minecraft Dialogue Corpus of conversations with agents in the Minecraft world, created by project partners at the University of Illinois, was annotated for coreference, while other partners at the University of Toulouse and at the University of Colorado at Boulder annotated it for discourse structure and semantics, respectively. The second achievement of the project was the deployment of a number of platforms powered by Large Language Models (LLMs), in which players can interact with conversational agents in games. A paper on this topic was submitted to ACL, and a second will be submitted to the WordPlay workshop. The main findings so far related to the extent to which LLMs can interpret coreference. A paper on this topic was accepted by LREC-COLING.
Exploitation Route	One outcome of the project that others can immediately use - in fact, have already used - is the datasets created, which have been made freely available through the Universal Anaphora GitHub and elsewhere. The conversational agents embedded in games we have developed using LLMs can be used not only to explore dialogue, but also in settings such as education and haealth.
Sectors	Creative Economy Digital/Communication/Information Technologies (including Software) Education Healthcare Leisure Activities including Sports Recreation and Tourism


Description	Much of this project's research, and in particular the dataset creation, has been carried out in collaboration with non-academic organizations, such as Meta for the Light dataset, and Microsoft for the Minecraft and IGLU datasets. These datasets have been made freely available to these organizations and other organizations in industry and elsewhere, which are using them as benchmarks. The conversational agent research is designed to produce systems that can be used in an applicative setting. We're in discussions with companies such as Toshiba to this end, and we're about to submit a proposal to Amazon who's interested in work on clarification requests in dialogue. We are also aiming to leverage the CA technology in an educational and health setting, in collaboration with NHS - e.g., to collect data that can be used to assess a teenager's mental health. A first proposal in this direction was submitted last year to the 'Transformative Health' call, but it was unsuccessful; we intend to submit a new one this year.
First Year Of Impact	2022
Sector	Creative Economy,Digital/Communication/Information Technologies (including Software)
Impact Types	Societal Economic


Title	The ARRAU 3.0 Corpus of Anaphoric Reference
Description	ARRAU is a corpus annotated with anaphoric reference. The third release of ARRAU corrects a number of issues with the previous versions, in particular relevant to anaphoric reference in dialogue.
Type Of Material	Database/Collection of data
Year Produced	2024
Provided To Others?	Yes
Impact	ARRAU is one of the most widely used datasets for anaphoric reference. Preliminary versions of this third releases were used as training data for CODI/CRAC 2022.
URL	https://sites.google.com/view/arrau/corpus


Title	The CODI/CRAC 2022 Corpus of Anaphoric Reference in Dialogue
Description	This is the largest existing corpus of anaphoric reference in dialogue in English.
Type Of Material	Database/Collection of data
Year Produced	2022
Provided To Others?	Yes
Impact	The corpus has just been made available but we expect it to become an important resource supporting research in this area.
URL	https://github.com/UniversalAnaphora


Title	The Phrase Detectives Corpus 3.0
Description	Although several datasets annotated for anaphoric reference / coreference exist, even the largest such datasets have limitations in term of size, range of domains, coverage of anaphoric phenomena, and size of documents included. Yet, the approaches proposed to scale up anaphoric annotation haven't so far resulted in datasets overcoming these limitations. In this paper, we introduce a new release of a corpus for anaphoric reference labelled via a game-with-a-purpose. This new release is comparable in size to the largest existing corpora for anaphoric reference due in part to substantial activity by the players, in part thanks to the use of a new resolve-and-aggregate paradigm to 'complete' markable annotations through the combination of an anaphoric resolver and an aggregation method for anaphoric reference. The proposed method could be adopted to greatly speed up annotation time in other projects involving games-with-a-purpose. In addition, the corpus covers genres for which no comparable size datasets exist (Fiction and Wikipedia); it covers singletons and non-referring expressions; and it includes a substantial number of long documents (> 2K in length).
Type Of Material	Database/Collection of data
Year Produced	2023
Provided To Others?	Yes
Impact	The corpus has just been released.
URL	https://github.com/dali-ambiguity/Phrase-Detectives-Corpus-3.0


Description	CODI/CRAC Shared Task 2022 on Anaphora in Dialogue
Organisation	Carnegie Mellon University
Country	United States
Sector	Academic/University
PI Contribution	The objective of the collaboration was to organize a shared task on anaphoric reference in dialogue. Our primary contributions were (i) running the task itself (setting up the Codalab site, etc) (ii) annotating the data in collaboration with our partners at CMU (Carolyn Rose and Lori Levin) and (iii) creating the scorer (UA scorer)
Collaborator Contribution	CMU helped with the organization, in particular with the advertising, and collaborated with us on annotating the data, providing US$ 50,000 of funding HITS Heidelberg helped with the organization and provided part of the funding for the annotation UT Dallas helped with the organization and collaborated with us on the scorer Intel Labs created the Codalab site and helped running the competition
Impact	Main outputs are (i) the CODI/CRAC dataset (ii) the LREC 2022 and CRAC 2022 publications listed in publications
Start Year	2022


Description	CODI/CRAC Shared Task 2022 on Anaphora in Dialogue
Organisation	Heidelberg Institute for Theoretical Studies
Country	Germany
Sector	Charity/Non Profit
PI Contribution	The objective of the collaboration was to organize a shared task on anaphoric reference in dialogue. Our primary contributions were (i) running the task itself (setting up the Codalab site, etc) (ii) annotating the data in collaboration with our partners at CMU (Carolyn Rose and Lori Levin) and (iii) creating the scorer (UA scorer)
Collaborator Contribution	CMU helped with the organization, in particular with the advertising, and collaborated with us on annotating the data, providing US$ 50,000 of funding HITS Heidelberg helped with the organization and provided part of the funding for the annotation UT Dallas helped with the organization and collaborated with us on the scorer Intel Labs created the Codalab site and helped running the competition
Impact	Main outputs are (i) the CODI/CRAC dataset (ii) the LREC 2022 and CRAC 2022 publications listed in publications
Start Year	2022


Description	CODI/CRAC Shared Task 2022 on Anaphora in Dialogue
Organisation	Intel Corporation
Department	INTEL Research
Country	United States
Sector	Private
PI Contribution	The objective of the collaboration was to organize a shared task on anaphoric reference in dialogue. Our primary contributions were (i) running the task itself (setting up the Codalab site, etc) (ii) annotating the data in collaboration with our partners at CMU (Carolyn Rose and Lori Levin) and (iii) creating the scorer (UA scorer)
Collaborator Contribution	CMU helped with the organization, in particular with the advertising, and collaborated with us on annotating the data, providing US$ 50,000 of funding HITS Heidelberg helped with the organization and provided part of the funding for the annotation UT Dallas helped with the organization and collaborated with us on the scorer Intel Labs created the Codalab site and helped running the competition
Impact	Main outputs are (i) the CODI/CRAC dataset (ii) the LREC 2022 and CRAC 2022 publications listed in publications
Start Year	2022


Description	CODI/CRAC Shared Task 2022 on Anaphora in Dialogue
Organisation	University of Texas at Dallas
Country	United States
Sector	Academic/University
PI Contribution	The objective of the collaboration was to organize a shared task on anaphoric reference in dialogue. Our primary contributions were (i) running the task itself (setting up the Codalab site, etc) (ii) annotating the data in collaboration with our partners at CMU (Carolyn Rose and Lori Levin) and (iii) creating the scorer (UA scorer)
Collaborator Contribution	CMU helped with the organization, in particular with the advertising, and collaborated with us on annotating the data, providing US$ 50,000 of funding HITS Heidelberg helped with the organization and provided part of the funding for the annotation UT Dallas helped with the organization and collaborated with us on the scorer Intel Labs created the Codalab site and helped running the competition
Impact	Main outputs are (i) the CODI/CRAC dataset (ii) the LREC 2022 and CRAC 2022 publications listed in publications
Start Year	2022


Description	Minecraft Dialogue Corpus Annotation
Organisation	Linagora
Country	France
Sector	Private
PI Contribution	The objective of this collaboration is to annotate the Minecraft Dialogue Corpus, created by Julia Hockenmaier's lab at the University of Illinois Urbana Champaign, to support the development of conversational agents. Our team is carrying out the annotation for coreference and reference, and developing the models.
Collaborator Contribution	University of Illinois provided the original corpus. Gothenburg University the tool we are using for coreference annotation. University at Colorado Boulder is carrying out the AMR annotation. LinAGORA is carrying out the discourse structure annotation.
Impact	The primary output of this will be the annotated corpus; completion is expected this year.
Start Year	2022


Description	Minecraft Dialogue Corpus Annotation
Organisation	University of Colorado Boulder
Country	United States
Sector	Academic/University
PI Contribution	The objective of this collaboration is to annotate the Minecraft Dialogue Corpus, created by Julia Hockenmaier's lab at the University of Illinois Urbana Champaign, to support the development of conversational agents. Our team is carrying out the annotation for coreference and reference, and developing the models.
Collaborator Contribution	University of Illinois provided the original corpus. Gothenburg University the tool we are using for coreference annotation. University at Colorado Boulder is carrying out the AMR annotation. LinAGORA is carrying out the discourse structure annotation.
Impact	The primary output of this will be the annotated corpus; completion is expected this year.
Start Year	2022


Description	Minecraft Dialogue Corpus Annotation
Organisation	University of Gothenburg
Country	Sweden
Sector	Academic/University
PI Contribution	The objective of this collaboration is to annotate the Minecraft Dialogue Corpus, created by Julia Hockenmaier's lab at the University of Illinois Urbana Champaign, to support the development of conversational agents. Our team is carrying out the annotation for coreference and reference, and developing the models.
Collaborator Contribution	University of Illinois provided the original corpus. Gothenburg University the tool we are using for coreference annotation. University at Colorado Boulder is carrying out the AMR annotation. LinAGORA is carrying out the discourse structure annotation.
Impact	The primary output of this will be the annotated corpus; completion is expected this year.
Start Year	2022


Description	Minecraft Dialogue Corpus Annotation
Organisation	University of Illinois at Urbana-Champaign
Department	School of Information Sciences
Country	United States
Sector	Academic/University
PI Contribution	The objective of this collaboration is to annotate the Minecraft Dialogue Corpus, created by Julia Hockenmaier's lab at the University of Illinois Urbana Champaign, to support the development of conversational agents. Our team is carrying out the annotation for coreference and reference, and developing the models.
Collaborator Contribution	University of Illinois provided the original corpus. Gothenburg University the tool we are using for coreference annotation. University at Colorado Boulder is carrying out the AMR annotation. LinAGORA is carrying out the discourse structure annotation.
Impact	The primary output of this will be the annotated corpus; completion is expected this year.
Start Year	2022


Description	Universal Anaphora Scorer
Organisation	Charles University
Country	Czech Republic
Sector	Academic/University
PI Contribution	The Universal Anaphora Scorer is a collaboration between groups in Europe and the USA to develop a new reference scorer covering not just coreference but other types of anaphora as well. We are the main developers of the code but we are building on previous scorers and collaborating with several groups to adapt it to different purposes.
Collaborator Contribution	Sameer Pradhan from LDC developed the previous official coreference scorer and have been testing our version Nafise Moosavi from Sheffield developed the first Python version of the scorer and collaborated with us on the extension Michal Novak from Charles University extended our first scorer and collaborated with us to merge the two versions Amir Zeldes from Georgetown University contributed to the design of the markup format
Impact	Primary output is the software itself available from Github A first paper on the scorer was presented at LREC 2022
Start Year	2022


Description	Universal Anaphora Scorer
Organisation	Georgetown University
Country	United States
Sector	Academic/University
PI Contribution	The Universal Anaphora Scorer is a collaboration between groups in Europe and the USA to develop a new reference scorer covering not just coreference but other types of anaphora as well. We are the main developers of the code but we are building on previous scorers and collaborating with several groups to adapt it to different purposes.
Collaborator Contribution	Sameer Pradhan from LDC developed the previous official coreference scorer and have been testing our version Nafise Moosavi from Sheffield developed the first Python version of the scorer and collaborated with us on the extension Michal Novak from Charles University extended our first scorer and collaborated with us to merge the two versions Amir Zeldes from Georgetown University contributed to the design of the markup format
Impact	Primary output is the software itself available from Github A first paper on the scorer was presented at LREC 2022
Start Year	2022


Description	Universal Anaphora Scorer
Organisation	University of Sheffield
Country	United Kingdom
Sector	Academic/University
PI Contribution	The Universal Anaphora Scorer is a collaboration between groups in Europe and the USA to develop a new reference scorer covering not just coreference but other types of anaphora as well. We are the main developers of the code but we are building on previous scorers and collaborating with several groups to adapt it to different purposes.
Collaborator Contribution	Sameer Pradhan from LDC developed the previous official coreference scorer and have been testing our version Nafise Moosavi from Sheffield developed the first Python version of the scorer and collaborated with us on the extension Michal Novak from Charles University extended our first scorer and collaborated with us to merge the two versions Amir Zeldes from Georgetown University contributed to the design of the markup format
Impact	Primary output is the software itself available from Github A first paper on the scorer was presented at LREC 2022
Start Year	2022


Description	CODI/CRAC Shared Task on Anaphora in Dialogue
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Research in NLP is largely driven by the organization of shared tasks. In order to promote research on coreference in dialogue, we collaborated with a number of partners (see the Collaborations tab) to organize a shared task on Anaphora Reference in Dialogue.
Year(s) Of Engagement Activity	2022
URL	https://aclanthology.org/volumes/2022.codi-crac/

Abstract

Organisations

People

ORCID iD

Publications