Author Profiling using stylometry

Lead Research Organisation: University of Birmingham

Abstract

This project aims to further the science of profiling through language so that it can be done quickly and effectively as soon as a written sample is retrieved. We will therefore focus on the task of profiling an author: The goal is to extract stylistic signals, patterns specific to a linguistic community from their writing, to see if these signals are detectable when members of that community write in a context outside of that linguistic community. The theory behind this is that we all take part in different linguistic communities, and our assumption is that each linguistic community we are a part of leaves a mark in our writing and speaking style, some of which may be detectable using stylometry, the quantitative study of writing style.
In order to achieve this, we will first take a look at the stylometric profiling tasks that have already been done, and the success in their methodology, in order to provide a nuanced summary of the tools profilers can already have at their disposal and how to use them. Doing this will also allow us to understand the needs of the profiling community, in order to create a list of priorities that will translate into experiments we carry out.
Each profiling task we embark on will most likely require a new corpus with its own curation needs, as we must make sure to minimize confounding variables. If properly maintained and updated, the corpora we create can also serve for other profilers to carry out their work with a corpus that is known (through cross-validation and our experimentation) to work for a particular profiling task.
To mitigate the risks for each profiling task, we will gather the corpora incrementally, so as to have regular checks for success and accuracy that will allow us to consistently make reports of the project's progress and decide which tasks are feasible.

Student:

Alejandro Jawerbaum

Period of Study:

Oct 23 - Sep 27

Funder:

ESRC

Project Status:

Active

Project Category:

Studentship

Project Reference:

2881667

Research Topic:

Unclassified

Organisations

University of Birmingham (Lead Research Organisation)

People	ORCID iD
Jack Grieve (Primary Supervisor)
Alejandro Jawerbaum (Student)

Publications

Author Name

Title Publication Date Published

10 25 50

Studentship Projects

Project Reference	Relationship	Related To	Start	End	Student Name
ES/P000711/1			01/10/2017	30/09/2027
2881667	Studentship	ES/P000711/1	01/10/2023	30/09/2027	Alejandro Jawerbaum

Abstract

Organisations

People

ORCID iD

Publications

Studentship Projects