Unlocking the research potential of unstructured patient data to improve health and treatment outcomes

Lead Research Organisation: University of Manchester
Department Name: Computer Science

Abstract

Electronic health records (EHRs) in hospitals contain a wealth of rich, routinely collected information that have the potential to drive improvements in patient care and research. The secondary care health data space is especially heterogeneous, as it includes well-structured variables (e.g. from diagnostic and laboratory tests, manually coded data) and semi/un-structured data (e.g. images or clinical letters). While the former is often the core of healthcare data science, the latter group of unstructured data means there are important gaps in our knowledge about aspects of secondary care.

Outpatient letters and in-patient clinical notes are semi-structured free-text documents that contain key clinical information about patients, their treatments and outcomes. Whilst recorded electronically for clinical purposes, their unstructured nature and sensitive content mean that this data source is often inaccessible for secondary use to support research or service improvement. For example, identifying certain patients who needed to shield during the current pandemic relied on hospital clinical teams reviewing thousands of patient letters manually, with significant resource and time implications. For conditions managed in hospital outpatients, there is surprisingly no national system for recording diagnoses or prescribed medications: the necessary information is only available as free text in letters. Guided by relevant clinical questions, automated text-mining techniques can unlock pertinent information hidden in the massive amount of data, which in turn can assist clinical decision making.

In this project, through an interdisciplinary team linking computer science, secondary care (NHS) and epidemiology, the candidate will help develop and validate a knowledge management framework to safely unlock information stored within EHRs, test and implement text-mining methods to obtain coded data for research, integrate it with other health data and demonstrate its impact and benefits through case studies. We will focus on hospital data from Salford Royal Foundation Trust, a Global Digital Exemplar (GDE) site, to develop and validate a sharable system for extracting diagnoses, medications and other pertinent information from hospital outpatient letters and inpatient records using text mining. While we will initially focus on musculoskeletal (MSK) conditions, the project will also explore necessary transfer-learning to tailor the system for other specialities and services.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
MR/W007428/1 01/10/2022 30/09/2028
2772802 Studentship MR/W007428/1 01/10/2022 30/09/2026 Arooj Hussain