Exploring the Potential of Natural Language Processing Techniques in Criminal Justice Agencies

Lead Research Organisation: University of Leeds
Department Name: Sch of Geography

Abstract

Full title: Exploring the Potential of Natural Language Processing Techniques in Criminal Justice Agencies: An Investigation of Racial Disparities in Release Decisions from the Parole Board
The Lammy Review (2017) highlights significant difficulties in accessing statistics with which to explore ethnic disparities in the Criminal Justice (CJ) system in England and Wales, and urges CJ agencies to redress this issue. In some instances, the solution simply involves publishing anonymised versions of existing datasets. In many other cases, the required data do not exist. Adopting new data collection processes is often unrealistic since it requires additional investments from agencies that have seen their budgets depleted over the last decade. In response to these constraints, this interdisciplinary project will explore an alternative cost-effective route to generate new data capable of highlighting potential disparities in the CJ system - by applying natural language processing techniques to the large volumes of free-text data stored in existing administrative CJ records.
All CJ agencies generate large numbers of records documenting the characteristics of cases processed and decisions adopted. There is a long tradition of manually processing such records using content analysis to code relevant text information into statistical datasets, thus enabling quantitative analyses (e.g. Hood, 1992; Myers and Talarico, 1987). The key problem with these techniques lies in their scalability. The cost/time of processing records manually is directly proportional to the number of cases to be processed, which renders samples either too small or too expensive (Pina-Sánchez et al., In Press). Following advances in the field of Data Science and Artificial Intelligence, we propose the use of text-mining techniques to undertake such coding process automatically.
Recently, Pina-Sánchez et al. (2019) demonstrated the potential of such techniques for the processing of sentence records. Yet this proof of concept is still affected by important limitations regarding document validity and the sophistication of algorithms presented. Building on this previous work, and partnering with The Parole Board - a key national CJ agency - the project will push the methodological frontier in this crucial area of research. The Parole Board carries out risk assessments to decide whether prisoners can be safely released into the community. In their latest annual exercise The Parole Board (2018a) processed 16,436 'paper-hearings' for which short structured textual summaries (roughly two-pages long) were routinely recorded. These 'hearing summaries' capture the main characteristics of the case, together with demographic factors of the prisoner including their ethnicity.
Analysing a significant sample of these 'hearing summaries' the project objectives are: i) to develop text-mining algorithms capable of processing 'hearing summaries'; ii) to assess the reliability of the data these algorithms generate; and iii) to analyse the data they produce to explore potential racial disparities in Parole Board decisions.
In conclusion, the proposed interdisciplinary project and the strategic partnership it leverages will provide the PhD student with a unique opportunity to explore how data science methods can be applied for public good in the context of criminal justice outcomes. By disseminating the statistical data and methods generated, the project will also enable innovative new lines of CJ research, and, it is hoped, facilitate the adoption of new cutting edge methods by CJ analysts across a range of government agencies (e.g. processing of pre-sentence reports, sentence transcripts, and virtually any other CJ records). Doing so the project has a very realistic potential of providing a direct realistic potential of providing a direct response to David Lammy's request for datasets shedding new light on any potential disparities in the CJ system.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
ES/T002085/1 01/10/2020 30/09/2027
2443762 Studentship ES/T002085/1 01/10/2020 30/09/2024 Erica McGovern