Detecting and correcting infelicitous phrases in non-native English writing

Lead Research Organisation: University of Cambridge
Department Name: Computer Science and Technology

Abstract

Education is one of the prominent fields where Natural Language Processing (NLP) could be applied to solve various problems. One of these problems is evaluating the quality of writing. There are various challenges in this task ranging from detecting grammar and spelling errors to judging the style, coherence and choice of words. The task can be further extended to validate the claims of the essay, capture the meaning it conveys and evaluate the novelty of the ideas it proposes. Those challenges, among many others, are what educators face when trying to evaluate an essay. The task gets even harder as it is almost impossible for humans to get a consensus about a piece of writing; thus, using a machine to get results that are comparable to human ones can only be harder.
The main area of interest for this research is evaluating the semantics of essays. In particular, the research will focus upon detecting and correcting the oversimplified or infelicitous expressions that non-native speakers might use in writing. The Cambridge Learner Corpus (CLC) will be used initially for the task as it consists of essays written by international examinees taking the Cambridge Assessment's English as a Second or Other Language (ESOL) examinations. In addition to CLC, the project intends to use other English comprehensive corpora such as the British National Corpus (BNC) or ukWaC.
Capturing this similarity in semantics requires investigating the different compositional semantics models and how the meaning of sentences could be related to real world models. The research proposes to use deep learning: Convolutional Neural Networks (CNNs) is a good choice for the implementation. By convolving over different textual units such as characters, words or sentences, various aspects of writing quality could be captured. With proper visualization techniques, feedback could be given to students, hence enabling them to improve their writing skills.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/N509620/1 01/10/2016 30/09/2022
1778176 Studentship EP/N509620/1 01/10/2016 31/03/2020 Youmna Farag
 
Description I have investigated discourse coherence which is an important aspect for writing quality. I created deep learning models that assess the coherence of a text, which outperformed published state-of-the-art models. Furthermore, I integrated my coherence models with essay scoring ones to enhance their ability to detect adversarial input of grammatical but incoherent essays. Finally, I created an evaluation setup to examine the linguistic features that neural coherence models learn to better understand these models and learn how to imporve them.
Exploitation Route My PhD thesis and published papers could be utilised by anyone who is interested to work on discourse coherence and deep learning, specifically multi-task learning.
Sectors Digital/Communication/Information Technologies (including Software),Education

 
Description My publications where used by others to build their research upon. Some of the citing papers focus on educational applications.
First Year Of Impact 2018
Sector Digital/Communication/Information Technologies (including Software),Education