Using BIG data to understand the BIG picture: Overcoming heterogeneity in speech for forensic applications

Lead Research Organisation: University of Huddersfield
Department Name: Sch of Music Humanities & Media

Abstract

Forensic speech science (FSS) - an applied sub-discipline of phonetics - has come to play a critical role in criminal cases involving voice evidence. Within FSS, Forensic speaker comparison (FSC) involves the comparison of a criminal recording (e.g. a threatening phone call), and a known suspect sample (e.g. a police interview). It is the role of an expert forensic phonetician to advise the trier of fact (e.g. judge or jury) on the likelihood of the two samples coming from the same speaker. There are two important elements involved in making such a comparison. First, the expert will carry out an assessment of the similarity of the speech characteristics in the criminal recording and the suspect sample. Second, the expert will assess the degree to which the same speech features for the criminal sample can be considered to be typical for a given speaker group. The speaker group will typically be defined by age, sex and geographical region (or accent). This second element is critical in providing context for the first; the suspect could have speech very similar to that in the criminal recording but this could be purely coincidental if they exhibit speech characteristics that are common to their speaker group. In contrast, if the criminal and suspect are observed as having speech features considered as being atypical for their speaker group then this would provide strong evidence for it being the same speaker.

One complication associated with FSC is that data to estimate whether a speech feature is typical or atypical for the given speaker group, commonly known as population data, are scarcely available. Population data are typically obtained by collecting a set of recordings containing the voices of a homogeneous group of speakers similar in age, sex, and geographical region (or accent). Unfortunately, the time and expense involved in the collection of population data means that forensic phoneticians face a huge challenge in obtaining such data for casework. This problem is further complicated by the high degree of variation that exists in speech across different speaker groups. Methodological research in the field of FSS has demonstrated that identifying the correct population for a FSC is vital in accurately representing the strength of evidence. It is largely for these reasons that experts argue that the biggest problem facing the field is the limited availability of population data.

The primary aim of this research is to explore a novel set of proposed methods that seek to remedy the aforementioned problems. The current lack of a platform on which to exchange data means that population data for a specific speaker group might have already been collected, unbeknown to experts in need of such data. This project intends to bring an end to this type of scenario by developing an international platform on which to share data, and also encouraging fellow researchers and experts to participate in data sharing. In addition, the project will explore the extent to which population data are generalizable; specifically, this will entail identifying the geographical (or regional accent) level at which speaker groups can be defined. For example, an expert might define a population group as having a Leeds accent, when in actuality a population defined more generally as West Yorkshire would suffice. This would clearly have implications for the way in which population data would be collected.

In order to explore the issue of defining the population data, a West Yorkshire (WY) database of 200 male speakers will be collected (including 50 speakers from each of the four urban areas: Huddersfield, Leeds, Bradford, and Wakefield). The database will be used to test the sensitivity of the strength of evidence when FSC cases are simulated using varying definitions of accent for the population data. In addition to serving methodological purpose, the WY database will also serve as a practical resource for casework and research in its own right.

Planned Impact

Who will benefit from this research?
The proposed project is expected to have direct and indirect beneficiaries of the research outputs. The first level consists of the direct users of the research: the forensic experts (government and non-government employed) and the policy makers (e.g. national judicial systems, professional bodies). The second level is comprised of members of the public and society in general, as they will indirectly benefit from the research through its use by the forensic experts in casework and policy makers in the justice system (or professional bodies/associations).

How will they benefit from this research?
At the first level of impact, forensic experts are anticipated to benefit from the proposed research by having access to more population data through the creation of a new West Yorkshire database and the establishment of an international platform on which to exchange population data. Forensic experts will also benefit by being able to heed government calls and demonstrate a higher level of transparency, reliability, and validity in casework through the inclusion of more population data, specifically with the population statistics undertaken by the proposed project. Finally, forensic experts will potentially benefit from being able to generalise speaker groups, which will eliminate the need to collect population data at a narrowly defined level. As a result forensic experts are expected to save a great deal of time and money that is associated with data collection.

Policy makers, also identified at the first level of impact, will gain an understanding of the amount of population data available for forensic speaker comparison cases, and the limitations involved with accounting for population data in casework. Therefore, it is hoped that the anticipated research output can directly influence specific policies (e.g. national and international governments, standards for professional bodies) regarding the use of population data in forensic speaker comparisons. This would be best demonstrated through the implementation of a protocol or best practice standard regarding population data collection, and the consultation of (more) population data in casework.

Members of the public and society in general are identified as indirect beneficiaries, as they are anticipated to benefit from a decrease in the miscarriages of justice. Misrepresented expert testimony has been identified as one of the leading causes for miscarriages of justices in court cases (Saks and Koehler, 2005). Therefore, it is vital that forensic experts are diligent in accurately representing the strength of evidence in a forensic speaker comparison case, which is facilitated foremost through the availability of population data. It is also projected that the public and society can benefit from experts conducting forensic speaker comparisons in a more time effective manner. If population data is more readily available, and more generalised population data is sufficient for consultation, then experts will reduce the amount of time needed in locating potential population data or having to collect it him/herself. Ultimately, society will also benefit from the research outputs, because experts will provide more transparency, validity, and reliability for evidence in the justice system.
 
Description We have now collected all of the data that we proposed. The UK Data Service now holds a copy of our database that includes the 180 speakers from West Yorkshire which includes over 1,000 audio files plus transcripts.

We also set out to see whether population data for geographically close areas may be able to be combined to save costs and time in case work. We found that regional variation may actually exist on levels that researchers never really considered. We were able to establish that certain parameters that experts recommended as being uniquely individual like f0 and LTFDs was true. However, VQ and Hesitation Markers, although they do appear to be generally idiosyncratic there appears to be evidence that these parameters may actually show some levels of micro-regional variation. This means that in forensic casework, population statistics must be carefully selected as there may be unintentional regional identity embedded in unexpected parameters which can cause misrepresentations of the evidence.

The biggest findings overall are that two of the most popular parameters used in casework are generally idiosyncratic BUT for a handful of the population regional identity is marked in these parameters. Based on preliminary investigations this may be predictable from the way in which a person identifies their 'origin'. Those who identify with a more national origin (i.e. British, English) take on more less regionally marked realisations, while those who identify as more local (i.e. Bradford, Huddersfield, West Yorkshire) are more likely to realise voice quality and hesitation markers in a more regionalised manner.
------------------
So far, we have examined 3 of the 4 parameters outlined int he grant proposal. We have looked at hesitation markers, fundamental frequency, and long term formant distributions. We have found that hesitations markers is the only phonetic parameter out of the three that seems to be regionally marked. This means that between Bradford, Kirklees, and Wakefield uhs and ums are produced slightly differently which may indicate a marker of regional identity. The other two parameters are relatively stable across the three boroughs and don't seem to be markers of regional identity.

In forensics, this means that some parameters are larger indicators of real identity, while others are not. The positive take on this, is that larger regional areas may be used for certain parameters when carrying out speaker comparison casework. However, practitioners will have to use more selective reference populations for known, regionally-marked phonetic features.
Exploitation Route This information can be directly applied to current forensic casework nationally, and perhaps internationally.

Perceptual studies could be hugely beneficial - we would be able to confirm whether regional variation is indeed identifiable to those from the area. This would provide evidence that confirms the need to separate population data out for certain parameters on a more fine-grained scale.
Sectors Aerospace, Defence and Marine,Communities and Social Services/Policy,Digital/Communication/Information Technologies (including Software),Education,Government, Democracy and Justice,Security and Diplomacy

 
Description The impact has been on two groups: (i) forensic speech scientists/expert witnesses and (ii) audio processing engineers. WYRED is the first database of its kind to include a large volume of high quality audio speech from a carefully stratified population of speakers (the only previously available database is much smaller and consists only of the speech of Cambridge undergraduates). The significance of this is that WYRED has enabled forensic speech scientists to make more reliable claims about the general characteristics of speech. This in turn has improved the reliability of decisions made by forensic speech scientists in casework. The unprecedented size and representativeness of WYRED has also enabled audio processing engineers to improve automatic speaker recognition systems. The project's reach encompasses the UK's only two commercial forensic speech science labs: Soundscape Voice Evidence and JP French Associates. Both of these labs are internationally renowned. Through its use by these companies, and their use of research generated from the project, WYRED has generated improved practices within the FSS community. The project's reach also extends to the audio processing sector, where Oxford Wave Research Ltd, a leading speech and audio processing company, have used WYRED as training data for developing improved speech recognition systems. Specific details of this impact are as follows: (i) Enhancing the work of forensic speech scientists/expert witnesses Prior to the existence of WYRED, forensic speech scientists lacked reliable population data (that is, representative samples of large groups of speakers with the same sociolinguistic profile) for determining the likelihood of a criminal recording (e.g. an incriminating voicemail) having been made by a particular suspect. WYRED has addressed this problem by providing the largest representative sample of population data currently available. WYRED has been adopted as a resource by JP French Associates, the UK's longest established forensic speech and acoustics laboratory, and Soundscape Voice Evidence. These companies are the UK's major providers of forensic speech analysis in criminal casework. Soundscape Voice Evidence, has adopted WYRED as a resource to support their consultancy work and have praised its value as a resource to the industry. Dr Christin Kirchübel, Director of Soundscape Voice Evidence has commented that "WYRED provides an invaluable resource from which population statistics can be derived". Dr Kirchübel also notes additional benefits to her company as a result of adopting WYRED as a resource. She notes that "Voice quality training provided as part of the WYRED has benefitted Soundscape Voice Evidence in two ways. Firstly it enabled the professional development of staff and, secondly, it highlighted current challenges in the analysis of a speaker's voice quality. (ii) Informing the development of automatic speaker recognition systems Audio processing researchers working on automatic speaker recognition require large amounts of training data to improve their software. However, such data is difficult to acquire, as a result of being expensive and time-consuming to collect. WYRED has consequently provided a major boost to research in this area. Oxford Wave Research Ltd, a research and development company focusing on audio acquisition, processing and pattern recognition, are using WYRED to assist in the development of new speech processing software for automatic speaker recognition. Data from WYRED will be included in the background reference populations that are built into Oxford Wave's Vocalise software. Oxford Wave's Research Director, Dr Anil Alexander, has said that 'This database will be a lasting legacy to the field'. Oxford Wave have also stated that they are eager to collaborate on new projects using WYRED data to enhance their systems.
First Year Of Impact 2019
Sector Aerospace, Defence and Marine,Digital/Communication/Information Technologies (including Software),Government, Democracy and Justice,Security and Diplomacy
Impact Types Societal

 
Title WIRED - West Yorkshire regional English database 2016-2019 
Description The West Yorkshire Regional English Database (WYRED) consists of approximately 200 hours of high-quality audio recordings of 180 West Yorkshire (British English) speakers. All participants are male between the ages of 18-30, and are divided evenly (60 per region) across three boroughs within West Yorkshire (Northern England): Bradford, Kirklees, and Wakefield. Speakers participated in four spontaneous speaking tasks. The first two tasks relate to a mock crime where the participant speaks to a police officer (Research Assistant 1) followed by an accomplice (Research Assistant 2). Speakers returned a minimum of 6 days later at which point they were paired with someone from their borough and recorded having a conversation on any topics they wish. The final task is an experimental task in which speakers are asked to leave a voicemail message related to the fictitious crime from the first recording session. In total, each speaker participated in approximately 1 hour of spontaneous speech recordings. The primary motivation for the construction of the West Yorkshire Regional English Database (WYRED) was to provide a collection of regionally stratified speech recordings (by boroughs) from within a single, politically defined region (a county). The corpus aims to facilitate research on methodological issues surrounding the delimitation of the reference population when considering the typicality of a speech sample for a given forensic speaker comparison case, while also providing valuable insight into the West Yorkshire accent(s). 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
Impact This is the published UK Data Service version of the database funded by this project. 
URL http://reshare.ukdataservice.ac.uk/id/eprint/854354
 
Title West Yorkshire Regional English Database (WYRED) - Snapshot as of 06/03/18 
Description As of today, we have now recorded 150 speakers. We have 30 more speakers that need to be recorded in order to finish the database. The database consists of 60 speakers from each of three boroughs (Bradford, Kirklees, and Wakefield). We have completed all 60 speakers from Kirklees and only need around 8 more speakers from Wakefield. 
Type Of Material Database/Collection of data 
Year Produced 2018 
Provided To Others? No  
Impact N/A 
 
Title West Yorkshire Regional English Database (WYRED) - Snapshot as of 12/03/19 
Description As of today, we have now recorded 186 speakers. We have 8 more speakers that need to be recorded in order to finish the database. The database consists of 60 speakers from each of three boroughs (Bradford, Kirklees, and Wakefield). We have completed all 60 speakers from Kirklees and Wakefield. We now only need to complete 8 more Bradford speakers. We have recorded the first session for 3 speakers, but need to recruit 5 more participants, and complete the 3 speakers second sessions. 
Type Of Material Database/Collection of data 
Year Produced 2019 
Provided To Others? No  
Impact Academically, the database is being used in 3 PhDs. It has already been used in one MSc dissertation at the University of York. We have also given a subset of the data to be used by the ESRC-funded SPADE project. 
 
Title West Yorkshire Regional English Database (WYRED) - Snapshot as of 14/03/17 
Description Once complete, WYRED will consist of 180 male speakers (60 Bradford, 60 Wakefield, and 60 Kirklees). Each participant will be recorded over 4 different conditions and provide approximately 1.5 hours of speech each. Despite technical difficulties with equipment (mains hum buzz as a result of wiring in the walls of the room) in the beginning, and only starting recordings in October 2016, my team has already managed to record 70 speakers. We have also been able to transcribe half of the recordings already, as we have 6 undergraduate student assistants contributing to the transcription process. Everything going well, we hope to record our last speaker by Christmas 2017. 
Type Of Material Database/Collection of data 
Provided To Others? No  
Impact Not yet. 
URL https://wyredproject.co.uk/
 
Description Oxford Wave Research 
Organisation Oxford Wave Research Ltd
Country United Kingdom 
Sector Private 
PI Contribution We have engaged with the biometrics company and they are currently using our database in their research and development of automatic speaker recognition systems.
Collaborator Contribution They have provided feedback on the quality and robustness of the recordings provided.
Impact The database is aiding in improving performance metrics of the Oxford Wave biometrics software.
Start Year 2019
 
Description Research Partnership with J P French Associates 
Organisation J P French Associates
Country United Kingdom 
Sector Private 
PI Contribution We have not been able to contribute anything to this partnership yet, but once the database and analysis is complete we will be able to contribute to their practice.
Collaborator Contribution The firm has provided real-world expertise in relation to their casework. They have been able to provide input on the types of recordings to be made as well as the content. We have significantly changed our 4th task in the database as a result of their advice.
Impact No outcomes to date. Yes, it is multi-disciplinary, as J P French Associates is a private lab carrying out forensic casework on voice evidence. They can be situated between the fields of Linguistics, Phonetics, Speech Technology, Forensic Science, and the Law.
Start Year 2016
 
Description Research Partnership with Soundscape Voice Evidence 
Organisation Soundscape Voice Evidence
Country United Kingdom 
Sector Private 
PI Contribution Dr Christin Kirchhubel set up her own forensic practice, so we worked closely with her after her move. We had a publication accepted today (01.03.21) with minor revision for English World Wide on Voice Quality in British English. This uses the WYRED data. We supplied Soundscape with relevant West Yorkshire population data on: hesitation markers, fundamental frequency, voice quality, and long term formant distributions.
Collaborator Contribution Soundscape trained myself and my team in auditory voice quality analysis using the VPA scheme. This is currently used in forensic practice. Through the training, we were able to carry out voice quality analysis and submitted a journal paper on it as a group.
Impact Journal paper accepted pending minor revisions. Journal: English World-Wide Title: Regional Variation in British English Voice Quality
Start Year 2018
 
Description Data Sharing Workshop 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact We invited 4 guest speakers (academics) to give talks at a one day workshop on data sharing. This followed on from the conference of the International Association of Forensic Phonetics and Acoustics that we held in August 2018. The event focussed on the sharing of data and practicalities surrounding data sharing in forensic case work as well as research in academia. We have posted all of the noted taken from discussions on that day to our website: wyredproject.co.uk. The event was attended by academics as well as industry participants working in the field of voice recognition.
Year(s) Of Engagement Activity 2018
URL https://wyredproject.co.uk/data-sharing-satellite-event/