Lexico-syntactic text simplification for improving information access

Lead Research Organisation: University of Aberdeen
Department Name: Computing Science

Abstract

Text simplification is the process of reducing the grammatical and lexical complexity of a text, while
retaining its information content and meaning. The main goal of simplification is to make information
more accessible to the large numbers of people with reduced literacy. The National Literacy Trust
(www.literacytrust.org.uk) estimates that one in six adults in the UK have poor literacy skills. There is
therefore a need to make information available in simple English, as advocated by organisations such as
the Plain English Campaign (www.plainenglish.co.uk). This need for text simplification is likely to
become more acute for a variety of reasons; for instance, a growing aging population with language difficulties
arising from neurodegeneration and other causes, children accessing information on the internet or lay
readers trying to access technical writing online (perhaps, to research an illness or treatment).

One of the most popular information sources online is Wikipedia (www.wikipedia.org), a free-content
encyclopedia written collaboratively by internet volunteers. The Simple English Wikipedia
(simple.wikipedia.org) initiative to make information more accessible contains over 60,000 articles in
Simplified English. However, these are only a fraction of the 3.3 million articles in the main English
Wikipedia and further, the simplified articles tend to be very short (often just the first paragraph). Our
goals in this proposal are twofold. From a theoretical perspective we want to gain an understanding of
the text revisions humans perform to simplify text, and learn rules for simplification from corpora. From
an applied perspective, we want to implement a system for automatic text simplification that can perform
the wide range of revisions that humans perform. We will make this system available to the Simple
English Wikipedia community as a tool to expand the content available in simplified form.

Planned Impact

We identified the expansion of content available in Simple English Wikipedia (SEW) as an objective. To achieve this we will engage with the SEW community in the last four months of the project and encourage them to edit and revise content that has been automatically created through simplification of existing English Wikipedia articles. We will seek structured user satisfaction feedback and free-text comments about the quality of the texts and level of simplification.

We are also in contact with teachers involved with deaf education. Current UK policy is to integrate deaf children in mainstream schools. This means that individual teachers with deaf students in their class have to prepare simplified material for these students who can suffer from a range of linguistic deficits stemming from lack of early exposure to language. Text simplification is a demanding task, like translation, and school teachers typically do not have any training to do this. We are in contact with educationists involved in deaf education, from a teaching as well as a policy perspective. We will explore the possibility of running our final free recall based evaluation using deaf students (if this is not possible, we will use other participants with language difficulties, such as second language learners).

Society's increasing dependence on online information creates both a challenge (to make information accessible) and an opportunity (to tailor language to the requirements of users). This issue is likely to increase in importance due to a variety of reasons - a growing aging population with language difficulties arising from neurodegeneration and other causes, children accessing information on the internet, lay readers trying to access technical writing online (perhaps, to research an illness or treatment), etc. The National Literacy Trust (www.literacytrust.org.uk) estimates that one in six adults in the UK have poor literacy skills and many organisations, such as the Plain English Campaign (www.plainenglish.co.uk), advocate the need to make information available in simple English. Thus, this research has the potential to positively impact on a large segment of the population.

Given the volume of evidence from cognitive science, psychology and literacy studies supporting the efficacy of text simplification, there has been surprisingly little research on automating the process. This is in stark contrast to other regeneration applications such as summarisation and translation that are well established. There has however been a recent spurt of interest in approaches to text simplification and we believe that this project is timely in that it will spur further research in this field.

Publications

10 25 50
publication icon
Siddharthan, A. (2014) Hybrid text simplification using synchronous dependency grammars with hand-written and automatically harvested rules in Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2014)

publication icon
Angrosh, M. (2014) Lexico-syntactic text simplification and compression with typed dependencies in Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

 
Description This project aimed to build a wide coverage text simplification system that can simplify both the grammar and vocabulary of sentences. We developed novel algorithms and representations to learn simplification operations from corpora. We also develped a sentence compression system that reduces the length of sentences by deleting less informative words and phrases, taking into account also the diffculty of the removed consttuents. The result is a hybrid system that combines manually developed and statistically trained compents.
Exploitation Route The software produced for this project is available on request and has been shared with several other research groups.
Sectors Digital/Communication/Information Technologies (including Software),Education

 
Description Combining Sentence Compression and Text Simplification 
Organisation National Institute of Japanese Literature
Country Japan 
Sector Public 
PI Contribution We provided Software for Text simplification and conducted the evaluation of the various systems developed. We also took the lead in preparing the conference paper that resulted. We have hosted Prof. Tadashi Nomoto twice at Aberdeen to coordinate the collaboration and to work together.
Collaborator Contribution Prof. Tadashi Nomoto developed a system for sentence compression that is optimised for text simplification tasks, and integrated it into a pipeline with our text simplification system. He has visited us at Aberdeen twice during the course of the project to coordinate the research and to work together.
Impact The collaboration resulted in a co-authored publication at COLING 2014, the 25th International Conference on Computational Linguistics, and the release of the software developed for sentence compression and text simplification.
Start Year 2013
 
Title RegenT Text Simplification 
Description The RegenT text simpification offers functionality for a range of lexico-syntactic text simplification operations. 
Type Of Technology Software 
Year Produced 2014 
Impact none to date 
 
Title Reluctant Trimmer 
Description This is software that implements "Reluctant sentence compression" for the purpose of text simplification 
Type Of Technology Software 
Year Produced 2014 
Open Source License? Yes  
Impact None to date. 
URL http://www.quantmedia.org/coling2014/
 
Description Invited Talk for International Conference 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other academic audiences (collaborators, peers etc.)
Results and Impact I presented an Invited Talk on "Text Simplification: A challenge for all Computational Linguists?", as well as a tutorial on Text Summarisation at The International Conference on Computational Processing of Portuguese (PROPOR 2014)

To early to judge, but some contacts were made that can lead to future collaboration.
Year(s) Of Engagement Activity 2014
URL http://www.nilc.icmc.usp.br/propor2014/
 
Description Invited Talk for International Workshop 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other academic audiences (collaborators, peers etc.)
Results and Impact I presented an invited talk at the COLING 2014 workshop on Automatic Text Simplification: Methods and Applications in the Multilingual Society

To early to judge.
Year(s) Of Engagement Activity 2014
 
Description Predicting and improving text readability for target reader populations 
Form Of Engagement Activity Scientific meeting (conference/symposium etc.)
Part Of Official Scheme? No
Type Of Presentation workshop facilitator
Geographic Reach International
Primary Audience Other academic audiences (collaborators, peers etc.)
Results and Impact I am a founding organiser of a Workshop series on "Predicting and Improving Text Readability (PITR)" that brings together researchers from linguistics, education and computer science who have an interest in text readability and simplification. The first Workshop was held in conjunction with the 2012 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT 2012), the second in conjunction with the 2013 Conference of the Association for Computational Linguistics (ACL 2013), and the third in conjunction with the 14th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2014).

Attendance has been growing from around 20 participants in the first workshop to around 35 in the third, and further workshops are planned.

The workshops have resulted in published open access proceedings containing peer reviewed full length papers.
Year(s) Of Engagement Activity 2012,2013,2014
URL http://mcs.open.ac.uk/nlg/pitr2014