Generation Challenges 2011: Towards a Surface Realisation Shared Task

Lead Research Organisation: University of Brighton
Department Name: Sch of Computing, Engineering & Maths

Abstract

Computers can now perform some writing tasks well (e.g. transcription and spell checking), but we still lack goodcomputational solutions for writing tasks which involve the creation of new text (as opposed to the typing orchecking of existing text). Natural language generation (NLG) is the branch of computer science that aims toaddress this lack, by developing methods and tools for the computational generation of spoken and writtenlanguage. NLG technology has a vast range of potential applications, including increasing the efficiency oftext-production processes (e.g. automated letter and report writing) and making information available in verbal formthat would otherwise be inaccessible (e.g. to the blind) or more time-consuming to process (e.g. converting weatherdata to a textual summary). However, NLG is only just beginning to fulfil this potential. Among the reasons is thefact that NLG did not until recently employ comparative forms of evaluation, as are essential for effectivecomparison of alternative approaches, consolidation and collective scientific progress.The NLG field's evaluation tradition lies in user-oriented and task-based evaluation of complete systems.This tradition is very different from the comparative evaluation paradigms that are predominant in otherareas of Natural Language Processing (NLP) where shared data resources, intrinsic, automatically computedmetrics, and human ratings of quality provide time-efficient and low-cost ways of comparing new systems andtechniques against existing approaches. In contrast, in NLG, until a few years ago, there simply was no comparative evaluation of independently developed alternative approaches. Yet without comparative evaluation there can be noconsolidation or collective progress in a field of research, and individual researchers and groups are left toprogress more or less separately. The GenChal initiative has firmly established comparative evaluation in NLG andproduced data sets and software tools to support it. Past shared tasks have addressed the subfield of referencegeneration and specific applications, and the time is now right to tackle a more ambitious challenge.With the Surface Realisation Task that forms the core of the present proposal, we are aiming for something trulygroundbreaking and of great potential use in practical applications: the development of a new generation of surfacerealisers that can be directly compared and, because they work from common input, can be substituted for eachother. Ultimately, this will mean that data-to-text generation, MT, summarisation and dialogue systems (amongother fields) will directly benefit from the availability of a range of reusable realisation components which systembuilders can test to determine which is best for their purpose, something that has not been possible before.

Planned Impact

Computers can now perform certain writing tasks well (e.g. transcription and spell checking), but we still lack computational solutions for writing tasks which involve the creation of new text (as opposed to the typing or checking of given text). Natural language generation (NLG) is the branch of computer science that aims to address this lack, by developing methods and tools for the computational generation of spoken and written language. NLG can increase the efficiency of text-production processes (e.g. automated letter and report writing) and make information available in verbal form that would otherwise be inaccessible (e.g. to the blind) or more time-consuming to process (e.g. converting weather data to a textual summary). NLG technology has relevance to the EPSRC's Digital Economy and Assisted Living/Lifelong Health themes, and the number of potential applications is vast. However, NLG has so far not lived up to its great potential. Among the reasons for this is that until recently, NLG lacked comparative evaluation, hence consolidation of research results, and was isolated from the rest of NLP, therefore not directly benefiting from advances there. It was shrinking as a field and lacked the kind of funding and participation that natural language analysis (NLA) fields have attracted. In order to ultimately fulfill its great potential (including commercial applications), NLG needs to achieve substantial technological progress. For this to be possible, NLG needs to (i) increase available reusable and trainable NLG tools and data resources; (ii) establish comparative evaluation as standard in order to consolidate and incrementally improve research results; (iii) to increase the field's critical mass (in terms of numbers of researchers, projects and events); and (iv) bridge to neighbouring disciplines where language is generated (to benefit from technological advances and available resources there). These very considerable impacts are our overarching goals in the Generation Challenges (GenChal) initiative. With the Surface Realisation Task that forms the core of the present proposal, we are aiming for a truly groundbreaking impact that is of great potential use in practical applications: the development of a new generation of surface realisers that can be directly compared and, because they work from common input, can be substituted for each other. Ultimately, this will mean that data-to-text generation, MT, summarisation and dialogue systems (among other fields) will directly benefit from the availability of a range of reusable realisation components which system builders can test to determine which is best for their purpose, something that has not been possible before.
 
Description 1. We have demonstrated the feasibility of creating a common-ground input representation that can be shared by multiple research teams to encode inputs to their language generation methods (long thought infeasible, with research teams working from varying, incomparable inputs).

2. We have demonstrated that directly comparable results can be produced by different research teams using the common-ground representations and training data encoded on the basis of the same representations.

3. We developed a benchmark data set along the above lines, and ran a grand challenge research competition on the above data, producing directly comparable surface realisation results for the first time.
Exploitation Route 1. The legacy benchmark data set is freely available and can be, and has been, used to compare future surface realisation methods against previous ones.

2. Several software tools connected with the shared task and its evaluation are also freely available.

3. The main breakthrough is the development of a framework within which independently developed surface realisation methodologies can be directly compared, hence inform each other's development.
Sectors Digital/Communication/Information Technologies (including Software),Culture, Heritage, Museums and Collections

 
Title SR'11 Shared Task Dataset 
Description The SR'11 Shared Task data comprises training, development and test data for the 2011 Surface Realisation Shared Task. Training and development data contain paired inputs and 'model' outputs, whereas the test data set contains only inputs. Inputs are abstract representations of meaning and structure; outputs are fully realised English sentences. 
Type Of Material Database/Collection of data 
Year Produced 2011 
Provided To Others? No  
 
Description Strategic Partnership with AT&T Labs 
Organisation AT&T Labs
Country United States 
Sector Private 
PI Contribution The collaborator is a member of the Surface Realisation Task Organising Committee.
Start Year 2010
 
Description Strategic Partnership with Dublin City University 
Organisation Dublin City University
Country Ireland 
Sector Academic/University 
PI Contribution The collaborators are members of the Surface Realisation Shared Task Organising Committee
Start Year 2010
 
Description Strategic Partnership with Ohio State University 
Organisation Ohio State University
Country United States 
Sector Academic/University 
PI Contribution The collaborator is a member of the Surface Realisation Task Organising Committee, and also contributed to the development of the Shared Task data and the evaluation of system outputs.
Start Year 2010
 
Description Strategic Partnership with Pompeu Fabra University, Barcelona 
Organisation Pompeu Fabra University
Country Spain 
Sector Academic/University 
PI Contribution The collaborators are members of the Surface Realisation Shared Task Organising Committee
Start Year 2011
 
Description Strategic Partnership with Stuttgart University 
Organisation University of Stuttgart
Country Germany 
Sector Academic/University 
PI Contribution The collaborator is a member of the Surface Realisation Shared Task Organising Committee
Start Year 2011
 
Title LG-Eval 
Description LG-eval is a toolkit for designing and implementing language evaluation experiments. LG-eval is the result of our work on numerous language evaluation experiments both in the context of GenChal shared tasks and in other contexts. In addition to the code itself, a thorough walk-through introduction with many examples can be found online (http://www.nltg.brighton.ac.uk/research/lg-eval). We are making the tool freely available and hope it will contribute to increasing the number of human evaluations and in particular meta-evaluations of evaluation methodology that the NLG field would benefit from. 
Type Of Technology Software 
Year Produced 2011 
URL http://www.nltg.brighton.ac.uk/research/lg-eval
 
Description EMNLP 2011 Workshop on Language Generation and Evaluation 
Form Of Engagement Activity A magazine, newsletter or online publication
Part Of Official Scheme? No
Geographic Reach International
Primary Audience
Results and Impact Generation Challenges 2011 homepage
Year(s) Of Engagement Activity 2011
 
Description GenChal Repository 
Form Of Engagement Activity A magazine, newsletter or online publication
Part Of Official Scheme? No
Geographic Reach International
Primary Audience
Results and Impact Starting in 2007, the field of natural language generation (NLG) has organised shared-task evaluation events every year, under the

Generation Challenges umbrella. In the course of these shared tasks, a wealth of data has been created, along with associated task

definitions and evaluation regimes. The GenChal online

repository is a one-stop resource for obtaining NLG task materials, both from Generation Challenges tasks and from

other sources, where the set of materials provided for each task consists of (i) task definition, (ii) input and output data, (iii) evaluation

software, (iv) documentation, and (v) previous publications.
Year(s) Of Engagement Activity 2011
 
Description Generation Challenges 2011 
Form Of Engagement Activity A magazine, newsletter or online publication
Part Of Official Scheme? No
Geographic Reach International
Primary Audience
Results and Impact Generation Challenges 2011 website
Year(s) Of Engagement Activity 2010
 
Description Generation Challenges 2011 Surface Realisation Shared Task 
Form Of Engagement Activity A magazine, newsletter or online publication
Part Of Official Scheme? No
Geographic Reach International
Primary Audience
Results and Impact Website of Surface Realisation Shared Task
Year(s) Of Engagement Activity 2011