Social contagion? : using data science to characterise the distribution and dispersion of health behaviours in adolescence
Lead Research Organisation:
University College London
Department Name: Institute of Health Informatics
Abstract
Mental illness is the largest cause of disability in the UK with far-reaching consequences, spanning education, work, and health, across the life course. Analytical approaches for monitoring public mental health and evaluating the impact of policy and interventions - at scale - are urgently needed. In this work I will use data science methods to characterise health behaviours and their spread through social groups, with a particular focus on adolescent self-harm behaviours (injuries related to drug/alcohol-use or violence, and intentional self-injury) in schools. The work will also explore how linked health and education data can be used as a platform for randomised trials of health interventions in schools and hospitals.
In England, the average secondary school classroom includes three children who will ever self-harm, with at least one child who is admitted to hospital with self-inflicted or violent injuries aged 10-19 years. Self-harm in adolescence is predictive of premature death (particularly with suicide and drug/alcohol-use causes) and future A&E attendances and hospital admissions. The causes of self-harm are varied and highly complex, reflecting biological, social and personality factors in tandem with environmental triggers, which could be appropriate targets for intervention (e.g. exposure to others' self-injurious behaviour). We currently lack objective measures of the scale and clustering of self-harm behaviours in schools, because studies in this field have often used cross-sectional or panel (repeated survey) designs that do not capture the precise timing of events, or are not mapped to schools. Identification of predictors of peer-group and individual self-harm behaviours will inform targeted prevention strategies in schools and hospitals. The proposed work applies data science methods to a range of de-identified electronic health records and large-scale education datasets to characterise different self-harm presentations, and investigates the timing and sequence of these health behaviours within school peer-groups.
Health trajectories (e.g. future risk of death) will be characterised for different presentations of self-harm. The first step in this process is developing systematic approaches for identifying self-harm presentations ("clinical phenotypes") in a range of healthcare settings, including general practice, A&E and hospitals. Next, the influence of environmental influences (describing the "exposome") will be investigated as risk factors for clinical phenotypes of self-harm within social groups. This phase of work draws on linked health and education data for England and Scotland to create detailed information on the timing and sequence of health behaviours in schools peer-groups, to investigate evidence of social contagion. Data science methods (e.g. natural language processing) will be applied to extract information from free text in medical records, and a combination of machine learning, statistical and epidemiological methods will be used to develop algorithms for detecting the most clinically important presentations of self-harm in schools-peer groups.
In England, the average secondary school classroom includes three children who will ever self-harm, with at least one child who is admitted to hospital with self-inflicted or violent injuries aged 10-19 years. Self-harm in adolescence is predictive of premature death (particularly with suicide and drug/alcohol-use causes) and future A&E attendances and hospital admissions. The causes of self-harm are varied and highly complex, reflecting biological, social and personality factors in tandem with environmental triggers, which could be appropriate targets for intervention (e.g. exposure to others' self-injurious behaviour). We currently lack objective measures of the scale and clustering of self-harm behaviours in schools, because studies in this field have often used cross-sectional or panel (repeated survey) designs that do not capture the precise timing of events, or are not mapped to schools. Identification of predictors of peer-group and individual self-harm behaviours will inform targeted prevention strategies in schools and hospitals. The proposed work applies data science methods to a range of de-identified electronic health records and large-scale education datasets to characterise different self-harm presentations, and investigates the timing and sequence of these health behaviours within school peer-groups.
Health trajectories (e.g. future risk of death) will be characterised for different presentations of self-harm. The first step in this process is developing systematic approaches for identifying self-harm presentations ("clinical phenotypes") in a range of healthcare settings, including general practice, A&E and hospitals. Next, the influence of environmental influences (describing the "exposome") will be investigated as risk factors for clinical phenotypes of self-harm within social groups. This phase of work draws on linked health and education data for England and Scotland to create detailed information on the timing and sequence of health behaviours in schools peer-groups, to investigate evidence of social contagion. Data science methods (e.g. natural language processing) will be applied to extract information from free text in medical records, and a combination of machine learning, statistical and epidemiological methods will be used to develop algorithms for detecting the most clinically important presentations of self-harm in schools-peer groups.
Technical Summary
The overarching aim of this work is to appraise and develop approaches for examining the distribution and dispersion of health behaviours captured in electronic health records. Self-harm behaviours associated with high-volume healthcare usage (intentional self-injury, substance misuse and violence) in secondary schools will be used as an exemplar. Three phases of work are planned. Phase 1 focuses on developing clinical phenotypes using data science methods to decipher the complexity of electronic health records. Phase 2 will appraise and apply methods for characterising the distribution and dispersion of clinical phenotypes in relation to the wider exposome, using linked health and education data. Phase 3 explores the potential for linked data to be used as a platform for enhancing the efficiency of randomised controlled trials of individually or cluster-randomised (at school or hospital-level) interventions. Data science methods will support the proposed work, in particular, natural language processing will support free text mining (e.g. free text recording of reason for A&E attendance) to enhance the development of clinical phenotypes for self-harm. Machine learning methods (e.g. random forest) will be explored as a means of classifying clinical subtypes and for investigating new approaches to characterising the interplay between the clinical phenome and exposome. Linked health and education data will be investigated as a platform for randomised controlled trials, and for nesting data capture with passive or sensing technology. Collectively the proposed work will establish tools for early phase research using linked health and education data, and provide new knowledge and insight into the dispersion of self-harm health behaviours in schools.
Publications
Aldridge R
(2023)
Estimating disease burden using national linked electronic health records: a study using an English population-based cohort.
in Wellcome Open Research
Aldridge RW
(2019)
Causes of death among homeless people: a population-based cross-sectional study of linked hospitalisation and mortality data in England.
in Wellcome open research
Blackburn RM
(2022)
COVID-19-related school closures and patterns of hospital admissions with stress-related presentations in secondary school-aged adolescents: weekly time series.
in The British journal of psychiatry : the journal of mental science
Blackburn RM
(2018)
Trends in Hospital Admissions for Nonfatal Adversity-Related Injury Among Youths in England, 2002-2016.
in JAMA pediatrics
Blackburn RM
(2019)
Nosocomial transmission of influenza: A retrospective cross-sectional study using next generation sequencing at a hospital in England (2012-2014).
in Influenza and other respiratory viruses
Etoori D
(2022)
Reductions in hospital care among clinically vulnerable children aged 0-4 years during the COVID-19 pandemic.
in Archives of disease in childhood
Description | Ongoing dialogue with the UK Government Department of Health and Social Care and Department for Education about the use of linked administrative data for education and health policy evaluation. |
Geographic Reach | National |
Policy Influence Type | Participation in a guidance/advisory committee |
Title | The ECHILD database |
Description | The Education and Child Health Insights from Linked Data (ECHILD) Database was established to enable large-scale, longitudinal research that explores inter-relationships across the domains of health, education and social care in childhood and adolescence. It brings together administrative data from Hospital Episode Statistics (HES) and the National Pupil Database (NPD) for all children and young people aged 0-24 years in England who were born between 1 September 1995 and 31 August 2020. In total, it includes linked HES and NPD records for approximately 14.7 million individuals. The ECHILD Database will shortly be available to accredited researchers by applying to the data providers (Department for Education and NHS Digital). |
Type Of Material | Database/Collection of data |
Year Produced | 2020 |
Provided To Others? | No |
Impact | The ECHILD database will opened up to the wider research community in 2023 following agreement of Information Governance arrangements. |
Description | Adolescent mental health in schools collaboration with South London and Maudsley Mental Health Trust |
Organisation | South London and Maudsley (SLAM) NHS Foundation Trust |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | I collaborating with data scientists and clinicians at South London and Maudsley NHS Trust to examine child and adolescent mental health (measured using Child and Adolescent Mental Health Services and Hospital attendance data) in South London, within school groups. My contribution has been initiating the collaboration, devising the research question and tailoring it to the available data. This draws on my expertise in study design and analysis of longitudinal electronic health records. |
Collaborator Contribution | My collaborators have provided me with access to privileged data (with appropriate information governance processes in place) and specialist support to extract and analyse (e.g. natural language processing of free text in clinical documents) it. |
Impact | The collaboration beings together data scientists, epidemiologists, psychiatrists and education experts. We are drafting a grant application to evaluate the impact of Mental Health Support Teams on mental health outcomes. |
Start Year | 2019 |
Description | Mapping administrative data on perinatal and maternal mental health in England and Scotland |
Organisation | University of Glasgow |
Department | Mental Health and Wellbeing Glasgow |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | This was a collaboration between three early-intermediate stage career researchers based at UCL (me), the University of Glasgow (project PI) and University of the West of Scotland - it is hard to isolate individual contributions. We jointly developed and were awarded a pump-priming grant from MatCHNet/UKPRP to map administrative data with whole national coverage of England and Scotland, and with the potential to capture maternal mental health in the perinatal period and beyond. My main contribution was supporting access to and expertise in data for England and Scotland held in the UCL safe haven and using this to develop indicators of perinatal mental health. I also co-lead stakeholder workshops in Edinburgh. |
Collaborator Contribution | Two workshops were organised and administrated by Rosie Seaman (Glasgow) and co-lead by all of us. Rosie also obtained additional funding from Glasgow Knowledge Exchange to support participant travel to the workshops. |
Impact | - 2 multidisciplinary stakeholder workshops that brought together stakeholders from across maternity services, mental health charities and researchers, and data experts in the UK - 2 additional small grants (both funded) |
Start Year | 2022 |
Description | Mapping administrative data on perinatal and maternal mental health in England and Scotland |
Organisation | University of the West of Scotland |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | This was a collaboration between three early-intermediate stage career researchers based at UCL (me), the University of Glasgow (project PI) and University of the West of Scotland - it is hard to isolate individual contributions. We jointly developed and were awarded a pump-priming grant from MatCHNet/UKPRP to map administrative data with whole national coverage of England and Scotland, and with the potential to capture maternal mental health in the perinatal period and beyond. My main contribution was supporting access to and expertise in data for England and Scotland held in the UCL safe haven and using this to develop indicators of perinatal mental health. I also co-lead stakeholder workshops in Edinburgh. |
Collaborator Contribution | Two workshops were organised and administrated by Rosie Seaman (Glasgow) and co-lead by all of us. Rosie also obtained additional funding from Glasgow Knowledge Exchange to support participant travel to the workshops. |
Impact | - 2 multidisciplinary stakeholder workshops that brought together stakeholders from across maternity services, mental health charities and researchers, and data experts in the UK - 2 additional small grants (both funded) |
Start Year | 2022 |
Description | HDR UK Early Career Researcher Committee Member - monthly evaluation and dissemination of outputs from HDR UK |
Form Of Engagement Activity | A magazine, newsletter or online publication |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Professional Practitioners |
Results and Impact | I am a member of the HDR UK Early Career Researcher Committee that evaluates outputs from the HDR UK community to promote excellence in; 1) science quality, 2) team science, 3) research scale, 4) open science, 4) public and patient involvement/impact, 5) equality, diversity and inclusion in research. We meet on a monthly basis to review HDR UK outputs and disseminate via Hive (HDR UK newsletter) a lay summary (written by a member of the committee) and link to the full work for the publication that is ranked most highly across the 5 domains by the committee. |
Year(s) Of Engagement Activity | 2019,2020 |
Description | Member of the Mental Health Committee of the Royal College of Emergency Medicine |
Form Of Engagement Activity | A formal working group, expert panel or dialogue |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Professional Practitioners |
Results and Impact | This is an ongoing commitment (for up to 6 years) in which I will continue to work with the Royal College of Emergency Medicine as part of their Lay (defined as non clinical) and Mental Health committees to enhance patient safety, the efficiency of services and promote education and training for medics at all stages of their careers. My particular remit and area of expertise is on mental health, use of data and growing research capacity within emergency medicine. |
Year(s) Of Engagement Activity | 2020,2021,2022,2023 |