Annotating Reference and Coreference In Dialogue Using Conversational Agents in games

Lead Research Organisation: Queen Mary University of London

Department Name: Sch of Electronic Eng & Computer Science

Abstract

The development of modern neural network architectures architectures such as the encoder/decoder model and the Transformer has brought about an explosion of interest in neural models for AI systems able to engage in conversations (aka conversational agents), reflected by a spike of published work, dedicated workshops, and industry-sponsored competitions and grants. While at first these models were applied to simple chatbots, the focus of research has been shifting towards conversational agents capable of engaging in more complex and task-oriented dialogue such as restaurant booking or question answering. But the results on these tasks show that while end-to-end architectures without dedicated models for semantic interpretation can work well for chatbots, conversational agents carrying out more complex tasks require greater ablity to handle such aspects of interpretation, and some form of modelling of context.

Among the aspects of natural language interpretation that require more advanced architectures are COREFERENCE and REFERENCE. For an example of the importance of coreference in dialog, consider the following except from a real-life chat conversation, where both participants continually use anaphoric expressions such as BOTH, THEY, IT, etc to refer to previously introduced entities such as Google or Microsoft.

A:Are you a fan of Google or Microsoft?
B:Both are excellent technology they are helpful in many ways. For the security purpose both are super.
A:I'm not a huge fan of Google, but I use it a lot because I have to. I think they are a monopoly in some sense.
B:Google provides online related services and products, which includes search engine and cloud computing.
A:Yeah, their services are good. I'm just not a fan of intrusive they can be on our personal lives

Enriching conversational agents with the ability to carry out these forms of interpretation raises two issues. First, developing models for these tasks requires specific training data: most deep-learning architectures are trained on large amounts of freely available written text. Training a coreference resolver on written text and domain-adapting it to dialogue however has proven ineffective as coreference in dialogue involves different phenomena and is more involved than coreference in text. Second, the developed architectures require specific modules that enable them to interpret coreference and reference. Our group has pioneered the use of Games-With-A-Purpose (GWAPs) to collect data for NLP, resulting in the largest NLP dataset collected using GWAPs or indeed crowdsourcing. But there is a fundamental difference between conversation and written text: the latter is designed to be read by third parties, whereas research has shown that overhearers to a conversation only acquire a partial understanding of what was said.

OUR PROPOSED SOLUTION to the problem of creating large annotated datasets of coreference and reference interpretation in conversation is to collect the judgments for anaphoric and referential information via GAMES IN WHICH CONVERSATIONAL AGENTS INTERACT WITH HUMAN PLAYERS AND EVOLVE BY ACQUIRING INFORMATION FROM THEM. This idea builds on recent work by Facebook and Microsoft, among others, that pioneered the use of conversational agents in games to collect data about dialogue, and of Hockenmaier and her lab. Our agents will be deployed in gaming platforms such as LIGHT and MINECRAFT in collaboration with these labs. But whereas in previous work conversational agents only interact with the aim to improve their end-to-end behavior,
in the proposed project we will develop artificial agents able to improve their ability to interpret coreference and reference by collecting judgments about these interpretation aspects via CLARIFICATION QUESTIONS to the players at appropriate moments, which can also be used to annotate a dataset.

Funded Value:

£1,091,328

Funded Period:

Feb 22 - Jul 25

Funder:

EPSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

EP/W001632/1

Principal Investigator:

Massimo Poesio

Research Subject:

Info. & commun. Technol. (70%)

Linguistics (30%)

Research Topic:

Artificial Intelligence (40%)

Computational Linguistics (30%)

Human-Computer Interactions (10%)

Information & Knowledge Mgmt (20%)

Organisations

People	ORCID iD
Massimo Poesio (Principal Investigator)
Richard Bartle (Co-Investigator)
Matthew Purver (Co-Investigator)
Diego Perez Liebana (Co-Investigator)
Jon Chamberlain (Co-Investigator)
Julian Hough (Co-Investigator)
Juntao Yu (Co-Investigator)	http://orcid.org/0000-0001-7971-9154

Publications

Author Name

Title Publication Date Published

|< < 1 2 3 4 > >|

10 25 50

Aliady W (2024) Linguistic Acceptability and Usability Enhancement: A Case Study of GWAP Evaluation and Redesign

Aliady W (2022) Coreference Annotation of an Arabic Corpus using a Virtual World Game

Aliady W (2024) Master the Linguistic Landscape: Puzzle Integration in a 3D NLP Game

Althani, F (2024) Using In-context Learning to Automate AI Image Generation for a Gamified Text Labelling Task

Caporusso J (2024) A Computational Analysis of the Dehumanisation of Migrants from Syria and Ukraine in Slovene News Media

Cokal D (2023) Anaphoric reference to mereological entities in Discourse Processes

Gan Y (2024) Assessing the Capabilities of Large Language Models in Coreference: An Evaluation

Gan Y (2023) Re-appraising the Schema Linking for Text-to-SQL

Ghinassi I (2023) Lessons Learnt from Linear Text Segmentation: a Fair Comparison of Architectural and Sentence Encoding Strategies for Successful Segmentation

Ghinassi I (2024) When Cohesion Lies in the Embedding Space: Embedding-Based Reference-Free Metrics for Topic Segmentation

Key Findings
Impact Summary
Research Databases and Models
Collaboration
Software and Technical Products
Engagement Activities


Description	One of the objectives of the project was to address the lack of datasets for studying referential interpretation in dialogue. The first achievement of the project was the creation of a number of datasets addressing this limitation. In the first year of the project, the CODI/CRAC corpus was released, created in collaboration with CMU in the US and ITS Heidelberg in Germany, and used for a joint shared task between the main ACL conferences on discourse and anaphora. In the second year, the Minecraft Dialogue Corpus of conversations with agents in the Minecraft world, created by project partners at the University of Illinois, was annotated for coreference, while other partners at the University of Toulouse and at the University of Colorado at Boulder annotated it for discourse structure and semantics, respectively. The second achievement of the project was the deployment of a number of platforms powered by Large Language Models (LLMs), in which players can interact with conversational agents in games. A paper on this topic was submitted to ACL, and a second will be submitted to the WordPlay workshop. The main findings so far related to the extent to which LLMs can interpret coreference. A paper on this topic was accepted by LREC-COLING.
Exploitation Route	One outcome of the project that others can immediately use - in fact, have already used - is the datasets created, which have been made freely available through the Universal Anaphora GitHub and elsewhere. The conversational agents embedded in games we have developed using LLMs can be used not only to explore dialogue, but also in settings such as education and haealth.
Sectors	Creative Economy Digital/Communication/Information Technologies (including Software) Education Healthcare Leisure Activities including Sports Recreation and Tourism


Description	Much of this project's research, and in particular the dataset creation, has been carried out in collaboration with non-academic organizations, such as Meta for the Light dataset, and Microsoft for the Minecraft and IGLU datasets. These datasets have been made freely available to these organizations and other organizations in industry and elsewhere, which are using them as benchmarks. The conversational agent research is designed to produce systems that can be used in an applicative setting. We're in discussions with companies such as Toshiba to this end, and we're about to submit a proposal to Amazon who's interested in work on clarification requests in dialogue. We are also aiming to leverage the CA technology in an educational and health setting, in collaboration with NHS - e.g., to collect data that can be used to assess a teenager's mental health. A first proposal in this direction was submitted last year to the 'Transformative Health' call, but it was unsuccessful; we intend to submit a new one this year.
First Year Of Impact	2022
Sector	Creative Economy,Digital/Communication/Information Technologies (including Software)
Impact Types	Societal Economic


Title	The ARRAU 3.0 Corpus of Anaphoric Reference
Description	ARRAU is a corpus annotated with anaphoric reference. The third release of ARRAU corrects a number of issues with the previous versions, in particular relevant to anaphoric reference in dialogue.
Type Of Material	Database/Collection of data
Year Produced	2024
Provided To Others?	Yes
Impact	ARRAU is one of the most widely used datasets for anaphoric reference. Preliminary versions of this third releases were used as training data for CODI/CRAC 2022.
URL	https://sites.google.com/view/arrau/corpus


Title	The CODI/CRAC 2022 Corpus of Anaphoric Reference in Dialogue
Description	This is the largest existing corpus of anaphoric reference in dialogue in English.
Type Of Material	Database/Collection of data
Year Produced	2022
Provided To Others?	Yes
Impact	The corpus has just been made available but we expect it to become an important resource supporting research in this area.
URL	https://github.com/UniversalAnaphora


Title	The Minecraft Dialogue Corpus With Reference
Description	The Minecraft Dialogue Corpus, created at the University of Illinois Urbana Champaign, is one of the primary datasets for studying conversational agents interaction with human agents in a situated task. Several other types of annotation have been added by groups worldwide, e.g., on Universal Meaning Representation (University of Colorado), or SDRT (University of Toulouse). Our group added one new layer of annotation, reference and coreference.
Type Of Material	Database/Collection of data
Year Produced	2024
Provided To Others?	Yes
Impact	We discusseed a number of possible collaborations on this dataset at the Dagstuhl workshop that we organized in December 2024. Currently, we are actively collaborating in projects using the dataset with the University of Illinois Urbana Champaign, the University of Gothenburg, and with the University of Trento.


Title	The Phrase Detectives Corpus 3.0
Description	Although several datasets annotated for anaphoric reference / coreference exist, even the largest such datasets have limitations in term of size, range of domains, coverage of anaphoric phenomena, and size of documents included. Yet, the approaches proposed to scale up anaphoric annotation haven't so far resulted in datasets overcoming these limitations. In this paper, we introduce a new release of a corpus for anaphoric reference labelled via a game-with-a-purpose. This new release is comparable in size to the largest existing corpora for anaphoric reference due in part to substantial activity by the players, in part thanks to the use of a new resolve-and-aggregate paradigm to 'complete' markable annotations through the combination of an anaphoric resolver and an aggregation method for anaphoric reference. The proposed method could be adopted to greatly speed up annotation time in other projects involving games-with-a-purpose. In addition, the corpus covers genres for which no comparable size datasets exist (Fiction and Wikipedia); it covers singletons and non-referring expressions; and it includes a substantial number of long documents (> 2K in length).
Type Of Material	Database/Collection of data
Year Produced	2023
Provided To Others?	Yes
Impact	The corpus has just been released.
URL	https://github.com/dali-ambiguity/Phrase-Detectives-Corpus-3.0


Description	CODI/CRAC Shared Task 2022 on Anaphora in Dialogue
Organisation	Carnegie Mellon University
Country	United States
Sector	Academic/University
PI Contribution	The objective of the collaboration was to organize a shared task on anaphoric reference in dialogue. Our primary contributions were (i) running the task itself (setting up the Codalab site, etc) (ii) annotating the data in collaboration with our partners at CMU (Carolyn Rose and Lori Levin) and (iii) creating the scorer (UA scorer)
Collaborator Contribution	CMU helped with the organization, in particular with the advertising, and collaborated with us on annotating the data, providing US$ 50,000 of funding HITS Heidelberg helped with the organization and provided part of the funding for the annotation UT Dallas helped with the organization and collaborated with us on the scorer Intel Labs created the Codalab site and helped running the competition
Impact	Main outputs are (i) the CODI/CRAC dataset (ii) the LREC 2022 and CRAC 2022 publications listed in publications
Start Year	2022


Description	CODI/CRAC Shared Task 2022 on Anaphora in Dialogue
Organisation	Heidelberg Institute for Theoretical Studies
Country	Germany
Sector	Charity/Non Profit
PI Contribution	The objective of the collaboration was to organize a shared task on anaphoric reference in dialogue. Our primary contributions were (i) running the task itself (setting up the Codalab site, etc) (ii) annotating the data in collaboration with our partners at CMU (Carolyn Rose and Lori Levin) and (iii) creating the scorer (UA scorer)
Collaborator Contribution	CMU helped with the organization, in particular with the advertising, and collaborated with us on annotating the data, providing US$ 50,000 of funding HITS Heidelberg helped with the organization and provided part of the funding for the annotation UT Dallas helped with the organization and collaborated with us on the scorer Intel Labs created the Codalab site and helped running the competition
Impact	Main outputs are (i) the CODI/CRAC dataset (ii) the LREC 2022 and CRAC 2022 publications listed in publications
Start Year	2022


Description	CODI/CRAC Shared Task 2022 on Anaphora in Dialogue
Organisation	Intel Corporation
Department	INTEL Research
Country	United States
Sector	Private
PI Contribution	The objective of the collaboration was to organize a shared task on anaphoric reference in dialogue. Our primary contributions were (i) running the task itself (setting up the Codalab site, etc) (ii) annotating the data in collaboration with our partners at CMU (Carolyn Rose and Lori Levin) and (iii) creating the scorer (UA scorer)
Collaborator Contribution	CMU helped with the organization, in particular with the advertising, and collaborated with us on annotating the data, providing US$ 50,000 of funding HITS Heidelberg helped with the organization and provided part of the funding for the annotation UT Dallas helped with the organization and collaborated with us on the scorer Intel Labs created the Codalab site and helped running the competition
Impact	Main outputs are (i) the CODI/CRAC dataset (ii) the LREC 2022 and CRAC 2022 publications listed in publications
Start Year	2022


Description	CODI/CRAC Shared Task 2022 on Anaphora in Dialogue
Organisation	University of Texas at Dallas
Country	United States
Sector	Academic/University
PI Contribution	The objective of the collaboration was to organize a shared task on anaphoric reference in dialogue. Our primary contributions were (i) running the task itself (setting up the Codalab site, etc) (ii) annotating the data in collaboration with our partners at CMU (Carolyn Rose and Lori Levin) and (iii) creating the scorer (UA scorer)
Collaborator Contribution	CMU helped with the organization, in particular with the advertising, and collaborated with us on annotating the data, providing US$ 50,000 of funding HITS Heidelberg helped with the organization and provided part of the funding for the annotation UT Dallas helped with the organization and collaborated with us on the scorer Intel Labs created the Codalab site and helped running the competition
Impact	Main outputs are (i) the CODI/CRAC dataset (ii) the LREC 2022 and CRAC 2022 publications listed in publications
Start Year	2022


Description	Minecraft Dialogue Corpus Annotation
Organisation	Linagora
Country	France
Sector	Private
PI Contribution	The objective of this collaboration is to annotate the Minecraft Dialogue Corpus, created by Julia Hockenmaier's lab at the University of Illinois Urbana Champaign, to support the development of conversational agents. Our team is carrying out the annotation for coreference and reference, and developing the models.
Collaborator Contribution	University of Illinois provided the original corpus. Gothenburg University the tool we are using for coreference annotation. University at Colorado Boulder is carrying out the AMR annotation. LinAGORA is carrying out the discourse structure annotation.
Impact	The primary output of this will be the annotated corpus; completion is expected this year.
Start Year	2022


Description	Minecraft Dialogue Corpus Annotation
Organisation	University of Colorado Boulder
Country	United States
Sector	Academic/University
PI Contribution	The objective of this collaboration is to annotate the Minecraft Dialogue Corpus, created by Julia Hockenmaier's lab at the University of Illinois Urbana Champaign, to support the development of conversational agents. Our team is carrying out the annotation for coreference and reference, and developing the models.
Collaborator Contribution	University of Illinois provided the original corpus. Gothenburg University the tool we are using for coreference annotation. University at Colorado Boulder is carrying out the AMR annotation. LinAGORA is carrying out the discourse structure annotation.
Impact	The primary output of this will be the annotated corpus; completion is expected this year.
Start Year	2022


Description	Minecraft Dialogue Corpus Annotation
Organisation	University of Gothenburg
Country	Sweden
Sector	Academic/University
PI Contribution	The objective of this collaboration is to annotate the Minecraft Dialogue Corpus, created by Julia Hockenmaier's lab at the University of Illinois Urbana Champaign, to support the development of conversational agents. Our team is carrying out the annotation for coreference and reference, and developing the models.
Collaborator Contribution	University of Illinois provided the original corpus. Gothenburg University the tool we are using for coreference annotation. University at Colorado Boulder is carrying out the AMR annotation. LinAGORA is carrying out the discourse structure annotation.
Impact	The primary output of this will be the annotated corpus; completion is expected this year.
Start Year	2022


Description	Minecraft Dialogue Corpus Annotation
Organisation	University of Illinois at Urbana-Champaign
Department	School of Information Sciences
Country	United States
Sector	Academic/University
PI Contribution	The objective of this collaboration is to annotate the Minecraft Dialogue Corpus, created by Julia Hockenmaier's lab at the University of Illinois Urbana Champaign, to support the development of conversational agents. Our team is carrying out the annotation for coreference and reference, and developing the models.
Collaborator Contribution	University of Illinois provided the original corpus. Gothenburg University the tool we are using for coreference annotation. University at Colorado Boulder is carrying out the AMR annotation. LinAGORA is carrying out the discourse structure annotation.
Impact	The primary output of this will be the annotated corpus; completion is expected this year.
Start Year	2022


Description	Universal Anaphora Scorer
Organisation	Charles University
Country	Czech Republic
Sector	Academic/University
PI Contribution	The Universal Anaphora Scorer is a collaboration between groups in Europe and the USA to develop a new reference scorer covering not just coreference but other types of anaphora as well. We are the main developers of the code but we are building on previous scorers and collaborating with several groups to adapt it to different purposes.
Collaborator Contribution	Sameer Pradhan from LDC developed the previous official coreference scorer and have been testing our version Nafise Moosavi from Sheffield developed the first Python version of the scorer and collaborated with us on the extension Michal Novak from Charles University extended our first scorer and collaborated with us to merge the two versions Amir Zeldes from Georgetown University contributed to the design of the markup format
Impact	Primary output is the software itself available from Github A first paper on the scorer was presented at LREC 2022
Start Year	2022


Description	Universal Anaphora Scorer
Organisation	Georgetown University
Country	United States
Sector	Academic/University
PI Contribution	The Universal Anaphora Scorer is a collaboration between groups in Europe and the USA to develop a new reference scorer covering not just coreference but other types of anaphora as well. We are the main developers of the code but we are building on previous scorers and collaborating with several groups to adapt it to different purposes.
Collaborator Contribution	Sameer Pradhan from LDC developed the previous official coreference scorer and have been testing our version Nafise Moosavi from Sheffield developed the first Python version of the scorer and collaborated with us on the extension Michal Novak from Charles University extended our first scorer and collaborated with us to merge the two versions Amir Zeldes from Georgetown University contributed to the design of the markup format
Impact	Primary output is the software itself available from Github A first paper on the scorer was presented at LREC 2022
Start Year	2022


Description	Universal Anaphora Scorer
Organisation	University of Sheffield
Country	United Kingdom
Sector	Academic/University
PI Contribution	The Universal Anaphora Scorer is a collaboration between groups in Europe and the USA to develop a new reference scorer covering not just coreference but other types of anaphora as well. We are the main developers of the code but we are building on previous scorers and collaborating with several groups to adapt it to different purposes.
Collaborator Contribution	Sameer Pradhan from LDC developed the previous official coreference scorer and have been testing our version Nafise Moosavi from Sheffield developed the first Python version of the scorer and collaborated with us on the extension Michal Novak from Charles University extended our first scorer and collaborated with us to merge the two versions Amir Zeldes from Georgetown University contributed to the design of the markup format
Impact	Primary output is the software itself available from Github A first paper on the scorer was presented at LREC 2022
Start Year	2022


Title	Minecraft-GPT4
Description	Minecraft-GPT4 is a web tool through which it is possible to carry out the Minecraft Builder Dialogue Agent Task of Hockenmaier et al by interacting with a conversational agent implemented using GPT4. The tool was used to carry out the experiments in the papers C Madge & M Poesio (2024). Large Language Models as Minecraft Agents. Proceedings of Wordplay. C Madge, M Poesio (2024). A LLM Benchmark based on the Minecraft Builder Dialog Agent Task. Proceedings of the 28th Workshop on the Semantics and Pragmatics of Dialogue. A video of a demonstration of the platform can be seen on YouTube: https://www.youtube.com/watch?v=N9n7u52Bbtk
Type Of Technology	Webtool/Application
Year Produced	2024
Open Source License?	Yes
Impact	We gave a demo of the tool to the dialogue group at Toshiba Research UK, who are interested in using similar methods to develop their own conversational agent. After the demos at ESSLLI 2024 and Semdial 2024 we started collaborations using the tool with the University of Gothenburg and the University of Trento.
URL	https://www.youtube.com/watch?v=N9n7u52Bbtk


Description	CODI/CRAC 2022 Shared Task on Anaphora in Dialogue
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Research in NLP is largely driven by the organization of shared tasks. In order to promote research on coreference in dialogue, we collaborated with a number of partners (see the Collaborations tab) to organize a shared task on Anaphora Reference in Dialogue.
Year(s) Of Engagement Activity	2022
URL	https://aclanthology.org/volumes/2022.codi-crac/


Description	Games and NLP 2022
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	The Games and NLP 2022 workshop at LREC , which ARCIDUCA co-chaired and sponsored, was the 9th edition of the Games and NLP workshop, in which academics and industry can present work on using Games in support of NLP, and viceversa, using NLP in support of games.
Year(s) Of Engagement Activity	2022
URL	https://2022.gamesandnlp.com/


Description	Games and NLP 2024
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	The Games and NLP workshop at LREC 2024, which ARCIDUCA co-chaired and sponsored, was the 10th edition of the series.
Year(s) Of Engagement Activity	2024
URL	https://gamesandnlp.com/


Description	Presentation at the ESSLLI 2024 Summer School Workshop on Conversational Grounding in the Era of Large Language Models
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	The European Summer School on Logic, Language and Information (ESSLLI) is the premier European Summer School in the areas of NLP, KR, Logic, and Linguistics. It is meant to give an opportunity to postgrad students, but also academics, to follow courses both at the foundational level and in the more advanced topics. This year's ESSLLI included a workshop on Conversational Grounding in the era of LLMs, one of the key topics of current research on conversational agents. I gave a keynote on ARCIDUCA themes, covering both our work with the Minecraft Dialogue Corpus and with clarifications understanding. The workshop was attended by around 60-70 participants.
Year(s) Of Engagement Activity	2024
URL	https://articulab.hcii.cs.cmu.edu/conversational-grounding-in-the-age-of-large-language-models/


Description	The Dagstuhl Perspectives Workshop 24492 on Human in the Loop Learning through Grounded Interaction in Games
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	The Dagstuhl Perspectives Workshops are a series of workshops held at Schloss Dagstuhl, Germany and meant to promote discussion on, and awareness of, the most recent developments in Computer Science and Artificial Intelligence. We led the organization of this particular event as the ARCIDUCA themes have become of great interest in the AI community and we have been involved in communications with a great many researchers internationally. The organization of the event started in September 2023, and involved contacting many of the key researchers in the area. During the event, that lasted a whole week, we identified and discuss a number of key directions that needed further funding and research. Two main outcomes are planned: a report, which was completed, and a forthcoming Manifesto.
Year(s) Of Engagement Activity	2024
URL	https://www.dagstuhl.de/24492


Description	YouTube demo of Minecraft-GPT4
Form Of Engagement Activity	A broadcast e.g. TV/radio/film/podcast (other than news/press)
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Public/other audiences
Results and Impact	This video introduces to the general public Minecraft-GPT4, the platform through which participants can interact in the Minecraft world with a conversational agent implemented via GPT-4
Year(s) Of Engagement Activity	2024
URL	https://www.youtube.com/watch?v=N9n7u52Bbtk

Abstract

Organisations

People

ORCID iD

Publications