Validity of German regional dialect markers for authorship profiling

Lead Research Organisation: University of Birmingham
Department Name: Department of English Literature

Abstract

Police and court cases in different jurisdictions involve understanding who the author of a piece of text is. The forensic linguistic discipline of authorship analysis (AA) tries to answer this question by inferring characteristics of authors from their writing (Juola 2007). This is based on
the assumption that every use of language manifests the characteristics of its producer or their backgrounds (Coulthard & Johnson 2007). Language can therefore be used to profile an author by inferring information about the background of the author, e.g. gender or age (Argamon et al. 2007). However, profiling the regional background of an author has received limited attention in the literature (Shuy 2001), especially in a forensic context (Chambers 1990). Additionally, research on AA more generally has been predominantly on English (Shuy 2007). Thus, my focus will be on the degree to which regional dialect features are useful for profiling the regional background of authors of short online communications written in German. This will help advance
the best practices for authorship profiling and in forensic linguistics.

My research goals are:

- Analysing whether the textualisation of regional dialect features in German is sufficient to
allow for regional profiling of online communication
- Identifying the best regional dialect features in German for pinpointing the regional
background of authors
- Providing a procedure for supporting the regional profiling of German in AA

This project will utilise prior location analyses working with dialectal variation to predict location (Scheffler et al. 2014). I will identify a set of linguistic features for regional classification based on similar research in traditional German dialectology, including broad lexical distinctions between east/west and north/south (Niebaum & Macha 2014) and Gazetteer terms (Han et al. 2014), as well as grammatical and phonological textualisation of regional varieties (Herrgen 2019). These language markers will then be used to regionally classify a set of localised postings from social media communications (Hovy & Purschke 2018). The results will enable me to develop a procedure for best practice in authorship profiling cases.

Ultimately, this project will have clear opportunity for societal impact. Finding the author of online communication is an everyday task for investigating agencies. Authorship profiling, especially for regional characteristics, can support and facilitate these tasks. Developing a
procedure for best practice with known reliability rates would greatly benefit these forensic
investigations. My research findings can be directly applied to cases of German authorship
analysis and authorship profiling and help find authors of malicious online communication.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
ES/P000711/1 01/10/2017 30/09/2027
2401287 Studentship ES/P000711/1 01/10/2020 26/12/2024 Dana Roemling