Detecting and correcting infelicitous phrases in non-native English writing

Lead Research Organisation: University of Cambridge

Department Name: Computer Science and Technology

Abstract

Education is one of the prominent fields where Natural Language Processing (NLP) could be applied to solve various problems. One of these problems is evaluating the quality of writing. There are various challenges in this task ranging from detecting grammar and spelling errors to judging the style, coherence and choice of words. The task can be further extended to validate the claims of the essay, capture the meaning it conveys and evaluate the novelty of the ideas it proposes. Those challenges, among many others, are what educators face when trying to evaluate an essay. The task gets even harder as it is almost impossible for humans to get a consensus about a piece of writing; thus, using a machine to get results that are comparable to human ones can only be harder.
The main area of interest for this research is evaluating the semantics of essays. In particular, the research will focus upon detecting and correcting the oversimplified or infelicitous expressions that non-native speakers might use in writing. The Cambridge Learner Corpus (CLC) will be used initially for the task as it consists of essays written by international examinees taking the Cambridge Assessment's English as a Second or Other Language (ESOL) examinations. In addition to CLC, the project intends to use other English comprehensive corpora such as the British National Corpus (BNC) or ukWaC.
Capturing this similarity in semantics requires investigating the different compositional semantics models and how the meaning of sentences could be related to real world models. The research proposes to use deep learning: Convolutional Neural Networks (CNNs) is a good choice for the implementation. By convolving over different textual units such as characters, words or sentences, various aspects of writing quality could be captured. With proper visualization techniques, feedback could be given to students, hence enabling them to improve their writing skills.

Student:

Youmna Farag

Period of Study:

Sep 16 - Mar 20

Funder:

EPSRC

Project Status:

Closed

Project Category:

Studentship

Project Reference:

1778176

Research Topic:

Unclassified

Organisations

University of Cambridge (Lead Research Organisation)

People	ORCID iD
Youmna Farag (Student)

Publications

Author Name

Title Publication Date Published

10 25 50

Farag Y (2020) Neural approaches to discourse coherence: modeling, evaluation and application

Farag Y (2019) Multi-Task Learning for Coherence Modeling

Farag Y (2018) Neural Automated Essay Scoring and Coherence Modeling for Adversarially Crafted Input

Farag Y (2020) Analyzing Neural Discourse Coherence Models

Studentship Projects

Project Reference	Relationship	Related To	Start	End	Student Name
EP/N509620/1			30/09/2016	29/09/2022
1778176	Studentship	EP/N509620/1	30/09/2016	30/03/2020	Youmna Farag

Key Findings
Impact Summary


Description	I have investigated discourse coherence which is an important aspect for writing quality. I created deep learning models that assess the coherence of a text, which outperformed published state-of-the-art models. Furthermore, I integrated my coherence models with essay scoring ones to enhance their ability to detect adversarial input of grammatical but incoherent essays. Finally, I created an evaluation setup to examine the linguistic features that neural coherence models learn to better understand these models and learn how to imporve them.
Exploitation Route	My PhD thesis and published papers could be utilised by anyone who is interested to work on discourse coherence and deep learning, specifically multi-task learning.
Sectors	Digital/Communication/Information Technologies (including Software) Education


Description	My publications where used by others to build their research upon. Some of the citing papers focus on educational applications.
First Year Of Impact	2018
Sector	Digital/Communication/Information Technologies (including Software),Education

Abstract

Organisations

People

ORCID iD

Publications

Studentship Projects