Examining judicial sentencing using court transcripts and natural language processing techniques

Lead Research Organisation: University of Leeds
Department Name: Law

Abstract

The project entails the use of natural language processing techniques to study sentence records available online at www.thelawpages.com. This is a private company specialised in providing information services to legal practitioners, including a vast repository of Crown Court sentence transcripts. These transcripts present certain details of the case in a systematic fashion (for example, the type of offence, the sentence outcome, or the court location), making it easy to retrieve. However, some other relevant details of the case (such as the number of previous convictions, or the existence of personal mitigating factors such as remorse) are less systematically reported. When present, they are embedded in different parts of the judge's statement, hence the need to rely on natural language processing techniques to record them.
The website captures a sample of 14,736 sentences passed by the Crown Court from 2000 to the present date. The sampling process employed has not been identified, which could question the external validity of analyses undertaken using this data. However, for the period from 2009 to 2015, this dataset captures 1,488 cases of homicide, about 40% of homicides sentenced in England and Wales during that period. This is a sampling fraction large enough to minimise criticism related to the representativity of the data. Focusing on offences of homicide we will seek within the 2009-2015 period we will seek to answer the following research questions:
RQ1. What are the aggravating and mitigating factors that judges take into account when sentencing cases of homicide?
RQ2: What is the effect that they have on the sentence outcome?
RQ3: How reliably can aggravating and mitigating factors in sentence transcripts be recorded using natural language processing compared to a supervised process of content analysis?
Part of the substantive originality of this project stems from the dearth of empirical analyses looking at cases of homicide. This is mainly due to the lack of adequate data with which to do so. The importance of finding out which are the main factors and how they are weighted is also reinforced by the absence of a sentencing guideline covering cases of homicide, which raises the question on how consistently these types of cases are sentenced.
The methodological contribution of the project is even clearer. Natural language processing techniques are a very rare research tool in the disciplines of Criminal Justice and Law. The only exception I am aware of is Evans et al (2007), who used them to classify legal texts according to specific criterion such as the ideology of the author. The creation of a dataset covering the main factual elements describing criminal sentences could open important research avenues in the future. In particular, using the data captured in transcripts we could deepened in the study of consistency in sentencing by going beyond the well documented disparities between courts to explore disparities between judges. Additionally, the data makes it possible to investigate new types of discrimination in sentencing such as those based on gender.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
ES/P000401/1 01/10/2017 30/09/2024
1949327 Studentship ES/P000401/1 01/10/2017 30/09/2021 Hannah Wooller