Generation Challenges 2011: Towards a Surface Realisation Shared Task
Lead Research Organisation:
University of Brighton
Department Name: Sch of Computing, Engineering & Maths
Abstract
Computers can now perform some writing tasks well (e.g. transcription and spell checking), but we still lack goodcomputational solutions for writing tasks which involve the creation of new text (as opposed to the typing orchecking of existing text). Natural language generation (NLG) is the branch of computer science that aims toaddress this lack, by developing methods and tools for the computational generation of spoken and writtenlanguage. NLG technology has a vast range of potential applications, including increasing the efficiency oftext-production processes (e.g. automated letter and report writing) and making information available in verbal formthat would otherwise be inaccessible (e.g. to the blind) or more time-consuming to process (e.g. converting weatherdata to a textual summary). However, NLG is only just beginning to fulfil this potential. Among the reasons is thefact that NLG did not until recently employ comparative forms of evaluation, as are essential for effectivecomparison of alternative approaches, consolidation and collective scientific progress.The NLG field's evaluation tradition lies in user-oriented and task-based evaluation of complete systems.This tradition is very different from the comparative evaluation paradigms that are predominant in otherareas of Natural Language Processing (NLP) where shared data resources, intrinsic, automatically computedmetrics, and human ratings of quality provide time-efficient and low-cost ways of comparing new systems andtechniques against existing approaches. In contrast, in NLG, until a few years ago, there simply was no comparative evaluation of independently developed alternative approaches. Yet without comparative evaluation there can be noconsolidation or collective progress in a field of research, and individual researchers and groups are left toprogress more or less separately. The GenChal initiative has firmly established comparative evaluation in NLG andproduced data sets and software tools to support it. Past shared tasks have addressed the subfield of referencegeneration and specific applications, and the time is now right to tackle a more ambitious challenge.With the Surface Realisation Task that forms the core of the present proposal, we are aiming for something trulygroundbreaking and of great potential use in practical applications: the development of a new generation of surfacerealisers that can be directly compared and, because they work from common input, can be substituted for eachother. Ultimately, this will mean that data-to-text generation, MT, summarisation and dialogue systems (amongother fields) will directly benefit from the availability of a range of reusable realisation components which systembuilders can test to determine which is best for their purpose, something that has not been possible before.
Planned Impact
Computers can now perform certain writing tasks well (e.g. transcription and spell checking), but we still lack computational solutions for writing tasks which involve the creation of new text (as opposed to the typing or checking of given text). Natural language generation (NLG) is the branch of computer science that aims to address this lack, by developing methods and tools for the computational generation of spoken and written language. NLG can increase the efficiency of text-production processes (e.g. automated letter and report writing) and make information available in verbal form that would otherwise be inaccessible (e.g. to the blind) or more time-consuming to process (e.g. converting weather data to a textual summary). NLG technology has relevance to the EPSRC's Digital Economy and Assisted Living/Lifelong Health themes, and the number of potential applications is vast. However, NLG has so far not lived up to its great potential. Among the reasons for this is that until recently, NLG lacked comparative evaluation, hence consolidation of research results, and was isolated from the rest of NLP, therefore not directly benefiting from advances there. It was shrinking as a field and lacked the kind of funding and participation that natural language analysis (NLA) fields have attracted. In order to ultimately fulfill its great potential (including commercial applications), NLG needs to achieve substantial technological progress. For this to be possible, NLG needs to (i) increase available reusable and trainable NLG tools and data resources; (ii) establish comparative evaluation as standard in order to consolidate and incrementally improve research results; (iii) to increase the field's critical mass (in terms of numbers of researchers, projects and events); and (iv) bridge to neighbouring disciplines where language is generated (to benefit from technological advances and available resources there). These very considerable impacts are our overarching goals in the Generation Challenges (GenChal) initiative. With the Surface Realisation Task that forms the core of the present proposal, we are aiming for a truly groundbreaking impact that is of great potential use in practical applications: the development of a new generation of surface realisers that can be directly compared and, because they work from common input, can be substituted for each other. Ultimately, this will mean that data-to-text generation, MT, summarisation and dialogue systems (among other fields) will directly benefit from the availability of a range of reusable realisation components which system builders can test to determine which is best for their purpose, something that has not been possible before.
People |
ORCID iD |
Anya Belz (Principal Investigator) |
Publications
Anja Belz (Author)
(2012)
A Repository of Data and Evaluation Resources for Natural Language Generation
Anja Susanne Belz (Author)
(2011)
Discrete vs. Continuous Rating Scales for Language Evaluation in NLP
Anja Belz (Author)
(2012)
LG-Eval: A Toolkit for Creating Online Language Evaluation Experiments
Anja Belz (Author)
(2011)
The First Surface Realisation Shared Task: Overview and Evaluation Results
Anja Belz (Author)
(2012)
The Surface Realisation Task: Recent Developments and Future Plans
Description | 1. We have demonstrated the feasibility of creating a common-ground input representation that can be shared by multiple research teams to encode inputs to their language generation methods (long thought infeasible, with research teams working from varying, incomparable inputs). 2. We have demonstrated that directly comparable results can be produced by different research teams using the common-ground representations and training data encoded on the basis of the same representations. 3. We developed a benchmark data set along the above lines, and ran a grand challenge research competition on the above data, producing directly comparable surface realisation results for the first time. |
Exploitation Route | 1. The legacy benchmark data set is freely available and can be, and has been, used to compare future surface realisation methods against previous ones. 2. Several software tools connected with the shared task and its evaluation are also freely available. 3. The main breakthrough is the development of a framework within which independently developed surface realisation methodologies can be directly compared, hence inform each other's development. |
Sectors | Digital/Communication/Information Technologies (including Software) Culture Heritage Museums and Collections |
Title | SR'11 Shared Task Dataset |
Description | The SR'11 Shared Task data comprises training, development and test data for the 2011 Surface Realisation Shared Task. Training and development data contain paired inputs and 'model' outputs, whereas the test data set contains only inputs. Inputs are abstract representations of meaning and structure; outputs are fully realised English sentences. |
Type Of Material | Database/Collection of data |
Year Produced | 2011 |
Provided To Others? | No |
Description | Strategic Partnership with AT&T Labs |
Organisation | AT&T Labs |
Country | United States |
Sector | Private |
PI Contribution | The collaborator is a member of the Surface Realisation Task Organising Committee. |
Start Year | 2010 |
Description | Strategic Partnership with Dublin City University |
Organisation | Dublin City University |
Country | Ireland |
Sector | Academic/University |
PI Contribution | The collaborators are members of the Surface Realisation Shared Task Organising Committee |
Start Year | 2010 |
Description | Strategic Partnership with Ohio State University |
Organisation | Ohio State University |
Country | United States |
Sector | Academic/University |
PI Contribution | The collaborator is a member of the Surface Realisation Task Organising Committee, and also contributed to the development of the Shared Task data and the evaluation of system outputs. |
Start Year | 2010 |
Description | Strategic Partnership with Pompeu Fabra University, Barcelona |
Organisation | Pompeu Fabra University |
Country | Spain |
Sector | Academic/University |
PI Contribution | The collaborators are members of the Surface Realisation Shared Task Organising Committee |
Start Year | 2011 |
Description | Strategic Partnership with Stuttgart University |
Organisation | University of Stuttgart |
Country | Germany |
Sector | Academic/University |
PI Contribution | The collaborator is a member of the Surface Realisation Shared Task Organising Committee |
Start Year | 2011 |
Title | LG-Eval |
Description | LG-eval is a toolkit for designing and implementing language evaluation experiments. LG-eval is the result of our work on numerous language evaluation experiments both in the context of GenChal shared tasks and in other contexts. In addition to the code itself, a thorough walk-through introduction with many examples can be found online (http://www.nltg.brighton.ac.uk/research/lg-eval). We are making the tool freely available and hope it will contribute to increasing the number of human evaluations and in particular meta-evaluations of evaluation methodology that the NLG field would benefit from. |
Type Of Technology | Software |
Year Produced | 2011 |
URL | http://www.nltg.brighton.ac.uk/research/lg-eval |
Description | EMNLP 2011 Workshop on Language Generation and Evaluation |
Form Of Engagement Activity | A magazine, newsletter or online publication |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | |
Results and Impact | Generation Challenges 2011 homepage |
Year(s) Of Engagement Activity | 2011 |
Description | GenChal Repository |
Form Of Engagement Activity | A magazine, newsletter or online publication |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | |
Results and Impact | Starting in 2007, the field of natural language generation (NLG) has organised shared-task evaluation events every year, under the Generation Challenges umbrella. In the course of these shared tasks, a wealth of data has been created, along with associated task definitions and evaluation regimes. The GenChal online repository is a one-stop resource for obtaining NLG task materials, both from Generation Challenges tasks and from other sources, where the set of materials provided for each task consists of (i) task definition, (ii) input and output data, (iii) evaluation software, (iv) documentation, and (v) previous publications. |
Year(s) Of Engagement Activity | 2011 |
Description | Generation Challenges 2011 |
Form Of Engagement Activity | A magazine, newsletter or online publication |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | |
Results and Impact | Generation Challenges 2011 website |
Year(s) Of Engagement Activity | 2010 |
Description | Generation Challenges 2011 Surface Realisation Shared Task |
Form Of Engagement Activity | A magazine, newsletter or online publication |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | |
Results and Impact | Website of Surface Realisation Shared Task |
Year(s) Of Engagement Activity | 2011 |