The automated coding of expressed emotion to enhance clinical and epidemiological mental health research in adolescence

Lead Research Organisation: King's College London
Department Name: Child and Adolescent Psychiatry

Abstract

As little as five minutes listening to a parent talk can reveal a significant amount of information about their child's future psychopathology. Analyses of the words and tone used in parents' speech, provides a detailed picture about the parents, the child, and interactions within the family. An interview technique called the five-minute speech sample (FMSS) has operationalised this process. There is good evidence the FMSS can provide an index of a child's home environment and help profile their risk of developing, and recovering from, adolescent-onset mental health disorders. FMSS are easy to collect - all you need is 5 minutes and a dictaphone/smartphone - yet they are rarely used in research or clinical settings, because the coding of speech is laborious, bias-prone, and requires highly trained raters. If these issues could be overcome, FMSS presents tremendous opportunity to be used across research, mental health and social care settings to rapidly assess key modifiable drivers of mental health problems among adolescents.

This project will bring together an interdisciplinary team of developmental psychopathologists, creative writers, plus computer, clinical and social scientists to automate the coding of the FMSS. Using recent developments in computational linguistics and affective computing, we will create a pipeline which combines automatic translation, speech valance and natural language analysis. We will use a unique collection of researcher-rated FMSS audio recordings of mothers from the UK E-Risk Longitudinal Twin Study, which were obtained on 2031 children at 10 years of age. The children in this cohort have been followed to age 18 years, undergoing multiple waves of comprehensive assessments. Building on an existing feasibility study, we will develop and train an automated approach to FMSS coding - exploiting both the size and socio-economic representativeness of the sample. We will then examine whether the automated ratings generated from age-10 maternal speech samples show the same ability to predict mental health problems at 12 and 18 years as well as costly human ratings.

We will conduct creative workshops with key stakeholders (young people, parents, healthcare/social-work practitioners, research policymakers, governance leads, and educators) throughout the project to determine the main ethical, social and practical challenges to using and sharing parental speech data and developing and implementing the automated models in practice to inform future work.

Crucially, all this work will help us understand how to share this methodology, and as a final output we will develop open-source materials and a clear blueprint of what is required to build a secure digital platform that enables other research groups to rapidly code expression emotion from FMSS in an accurate and cost-effective manner.

Technical Summary

Our goal is to develop, validate and scale an automatic tool to measure caregiver expressed emotion (EE) for the adolescent age range. Caregiver EE acts a transdiagnostic risk factor for both adolescent-onset psychopathology and as potential mediator of both pharmacological and psycho-social treatment. EE can be accurately captured using five minutes of audio of a mother speaking about their offspring. The ease of collecting these five minute speech samples (FMSS) is countered by the time and skills needed to apply EE coding frameworks - significantly limiting their application in both adolescent development and clinical research. This project will integrate developmental science and computational linguistic approaches to develop an Artificial Intelligence (AI) system to automatically identify constructs linked to expressed emotions values from FMSS. This project's technical objectives are to use the audio data and corresponding FMSS coding from 2021 adolescents within the in E-Risk cohort to (i) test deep learning classification models for EE scores and compare with summary ratings produced by highly trained humans (ii) test the feasibility of these deep learning sequence labelling models models to annotate positive and negative comments, negativity and warmth (iii) examine the AI model performance across socio-economic strata and geographical locations (iv) explore whether the human ratings of EE and AI outputs are predictive of adolescent mental health outcomes. We will co-develop guidance with young people, caregivers, researchers and university governance, on the main ethical-social-technical challenges in using and sharing parental speech data and how they may be surmounted. We will then use the ethical-social-technical learning we have acquired, to develop a detailed product brief on what is required to provide an accessible, secure digital platform that enables external researchers to automatically code expression emotion from FMSS audio data.