Generation Challenges 2011: Towards a Surface Realisation Shared Task

Lead Research Organisation: University of Brighton

Department Name: Sch of Computing, Engineering & Maths

Abstract

Computers can now perform some writing tasks well (e.g. transcription and spell checking), but we still lack goodcomputational solutions for writing tasks which involve the creation of new text (as opposed to the typing orchecking of existing text). Natural language generation (NLG) is the branch of computer science that aims toaddress this lack, by developing methods and tools for the computational generation of spoken and writtenlanguage. NLG technology has a vast range of potential applications, including increasing the efficiency oftext-production processes (e.g. automated letter and report writing) and making information available in verbal formthat would otherwise be inaccessible (e.g. to the blind) or more time-consuming to process (e.g. converting weatherdata to a textual summary). However, NLG is only just beginning to fulfil this potential. Among the reasons is thefact that NLG did not until recently employ comparative forms of evaluation, as are essential for effectivecomparison of alternative approaches, consolidation and collective scientific progress.The NLG field's evaluation tradition lies in user-oriented and task-based evaluation of complete systems.This tradition is very different from the comparative evaluation paradigms that are predominant in otherareas of Natural Language Processing (NLP) where shared data resources, intrinsic, automatically computedmetrics, and human ratings of quality provide time-efficient and low-cost ways of comparing new systems andtechniques against existing approaches. In contrast, in NLG, until a few years ago, there simply was no comparative evaluation of independently developed alternative approaches. Yet without comparative evaluation there can be noconsolidation or collective progress in a field of research, and individual researchers and groups are left toprogress more or less separately. The GenChal initiative has firmly established comparative evaluation in NLG andproduced data sets and software tools to support it. Past shared tasks have addressed the subfield of referencegeneration and specific applications, and the time is now right to tackle a more ambitious challenge.With the Surface Realisation Task that forms the core of the present proposal, we are aiming for something trulygroundbreaking and of great potential use in practical applications: the development of a new generation of surfacerealisers that can be directly compared and, because they work from common input, can be substituted for eachother. Ultimately, this will mean that data-to-text generation, MT, summarisation and dialogue systems (amongother fields) will directly benefit from the availability of a range of reusable realisation components which systembuilders can test to determine which is best for their purpose, something that has not been possible before.

Planned Impact

Computers can now perform certain writing tasks well (e.g. transcription and spell checking), but we still lack computational solutions for writing tasks which involve the creation of new text (as opposed to the typing or checking of given text). Natural language generation (NLG) is the branch of computer science that aims to address this lack, by developing methods and tools for the computational generation of spoken and written language. NLG can increase the efficiency of text-production processes (e.g. automated letter and report writing) and make information available in verbal form that would otherwise be inaccessible (e.g. to the blind) or more time-consuming to process (e.g. converting weather data to a textual summary). NLG technology has relevance to the EPSRC's Digital Economy and Assisted Living/Lifelong Health themes, and the number of potential applications is vast. However, NLG has so far not lived up to its great potential. Among the reasons for this is that until recently, NLG lacked comparative evaluation, hence consolidation of research results, and was isolated from the rest of NLP, therefore not directly benefiting from advances there. It was shrinking as a field and lacked the kind of funding and participation that natural language analysis (NLA) fields have attracted. In order to ultimately fulfill its great potential (including commercial applications), NLG needs to achieve substantial technological progress. For this to be possible, NLG needs to (i) increase available reusable and trainable NLG tools and data resources; (ii) establish comparative evaluation as standard in order to consolidate and incrementally improve research results; (iii) to increase the field's critical mass (in terms of numbers of researchers, projects and events); and (iv) bridge to neighbouring disciplines where language is generated (to benefit from technological advances and available resources there). These very considerable impacts are our overarching goals in the Generation Challenges (GenChal) initiative. With the Surface Realisation Task that forms the core of the present proposal, we are aiming for a truly groundbreaking impact that is of great potential use in practical applications: the development of a new generation of surface realisers that can be directly compared and, because they work from common input, can be substituted for each other. Ultimately, this will mean that data-to-text generation, MT, summarisation and dialogue systems (among other fields) will directly benefit from the availability of a range of reusable realisation components which system builders can test to determine which is best for their purpose, something that has not been possible before.

Funded Value:

£67,889

Funded Period:

May 11 - Apr 12

Funder:

EPSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

EP/I032320/1

Principal Investigator:

Anya Belz

Research Subject:

Info. & commun. Technol. (80%)

Linguistics (20%)

Research Topic:

Artificial Intelligence (20%)

Comput./Corpus Linguistics (20%)

Human Communication in ICT (60%)

Organisations

People	ORCID iD
Anya Belz (Principal Investigator)

Publications

Author Name

Title Publication Date Published

10 25 50

Mille S (2017) Shared Task Proposal: Multilingual Surface Realization Using Universal Dependency Trees

Anja Susanne Belz (Author) (2011) Discrete vs. Continuous Rating Scales for Language Evaluation in NLP

Anja Belz (Author) (2011) The First Surface Realisation Shared Task: Overview and Evaluation Results

Anja Belz (Author) (2012) LG-Eval: A Toolkit for Creating Online Language Evaluation Experiments

Anja Belz (Author) (2012) The Surface Realisation Task: Recent Developments and Future Plans

Anja Belz (Author) (2012) A Repository of Data and Evaluation Resources for Natural Language Generation

Key Findings
Research Databases and Models
Collaboration
Software and Technical Products
Engagement Activities


Description	1. We have demonstrated the feasibility of creating a common-ground input representation that can be shared by multiple research teams to encode inputs to their language generation methods (long thought infeasible, with research teams working from varying, incomparable inputs). 2. We have demonstrated that directly comparable results can be produced by different research teams using the common-ground representations and training data encoded on the basis of the same representations. 3. We developed a benchmark data set along the above lines, and ran a grand challenge research competition on the above data, producing directly comparable surface realisation results for the first time.
Exploitation Route	1. The legacy benchmark data set is freely available and can be, and has been, used to compare future surface realisation methods against previous ones. 2. Several software tools connected with the shared task and its evaluation are also freely available. 3. The main breakthrough is the development of a framework within which independently developed surface realisation methodologies can be directly compared, hence inform each other's development.
Sectors	Digital/Communication/Information Technologies (including Software),Culture, Heritage, Museums and Collections


Title	SR'11 Shared Task Dataset
Description	The SR'11 Shared Task data comprises training, development and test data for the 2011 Surface Realisation Shared Task. Training and development data contain paired inputs and 'model' outputs, whereas the test data set contains only inputs. Inputs are abstract representations of meaning and structure; outputs are fully realised English sentences.
Type Of Material	Database/Collection of data
Year Produced	2011
Provided To Others?	No


Description	Strategic Partnership with AT&T Labs
Organisation	AT&T Labs
Country	United States
Sector	Private
PI Contribution	The collaborator is a member of the Surface Realisation Task Organising Committee.
Start Year	2010


Description	Strategic Partnership with Dublin City University
Organisation	Dublin City University
Country	Ireland
Sector	Academic/University
PI Contribution	The collaborators are members of the Surface Realisation Shared Task Organising Committee
Start Year	2010


Description	Strategic Partnership with Ohio State University
Organisation	Ohio State University
Country	United States
Sector	Academic/University
PI Contribution	The collaborator is a member of the Surface Realisation Task Organising Committee, and also contributed to the development of the Shared Task data and the evaluation of system outputs.
Start Year	2010


Description	Strategic Partnership with Pompeu Fabra University, Barcelona
Organisation	Pompeu Fabra University
Country	Spain
Sector	Academic/University
PI Contribution	The collaborators are members of the Surface Realisation Shared Task Organising Committee
Start Year	2011


Description	Strategic Partnership with Stuttgart University
Organisation	University of Stuttgart
Country	Germany
Sector	Academic/University
PI Contribution	The collaborator is a member of the Surface Realisation Shared Task Organising Committee
Start Year	2011


Title	LG-Eval
Description	LG-eval is a toolkit for designing and implementing language evaluation experiments. LG-eval is the result of our work on numerous language evaluation experiments both in the context of GenChal shared tasks and in other contexts. In addition to the code itself, a thorough walk-through introduction with many examples can be found online (http://www.nltg.brighton.ac.uk/research/lg-eval). We are making the tool freely available and hope it will contribute to increasing the number of human evaluations and in particular meta-evaluations of evaluation methodology that the NLG field would benefit from.
Type Of Technology	Software
Year Produced	2011
URL	http://www.nltg.brighton.ac.uk/research/lg-eval


Description	EMNLP 2011 Workshop on Language Generation and Evaluation
Form Of Engagement Activity	A magazine, newsletter or online publication
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience
Results and Impact	Generation Challenges 2011 homepage
Year(s) Of Engagement Activity	2011


Description	GenChal Repository
Form Of Engagement Activity	A magazine, newsletter or online publication
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience
Results and Impact	Starting in 2007, the field of natural language generation (NLG) has organised shared-task evaluation events every year, under the Generation Challenges umbrella. In the course of these shared tasks, a wealth of data has been created, along with associated task definitions and evaluation regimes. The GenChal online repository is a one-stop resource for obtaining NLG task materials, both from Generation Challenges tasks and from other sources, where the set of materials provided for each task consists of (i) task definition, (ii) input and output data, (iii) evaluation software, (iv) documentation, and (v) previous publications.
Year(s) Of Engagement Activity	2011


Description	Generation Challenges 2011
Form Of Engagement Activity	A magazine, newsletter or online publication
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience
Results and Impact	Generation Challenges 2011 website
Year(s) Of Engagement Activity	2010


Description	Generation Challenges 2011 Surface Realisation Shared Task
Form Of Engagement Activity	A magazine, newsletter or online publication
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience
Results and Impact	Website of Surface Realisation Shared Task
Year(s) Of Engagement Activity	2011

Abstract

Planned Impact

Organisations

People

ORCID iD

Publications