Lexical Chunks and the Nature of Idiolects

Lead Research Organisation: University of Manchester
Department Name: Arts Languages and Cultures

Abstract

The key areas of my project are idiolect and authorship analysis. Character n-grams are strings of n characters. For example, the 2-grams in a cat are ac, ca and at.
Character n-grams appear to be a useful way of identifying authors, however the
explanations for this are disputed. It is not intuitive that the consistent use of small chunks
of letters or words could be distinctive to an individual, and yet it can be. This has been
exemplified by findings such as that of Grieve et al (2019), that have been successful in
identifying the author of a text using n-grams. I am set on my endeavour to undertake an
evidence-based approach to understand why n-grams are useful. Computer scientists
have contributed the majority of the research about the topic, however now there is a
need for greater specialist input regarding linguistic theory, which I hope to contribute to.
My master's thesis will be a smaller scale study of n-gram tracing and the extent to which
it accounts for topic.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
ES/P000665/1 01/10/2017 30/09/2027
2885513 Studentship ES/P000665/1 01/10/2023 30/09/2027 Sadie Barlow