Integrating hospital outpatient letters into the healthcare data space

Lead Research Organisation: University of Manchester
Department Name: Computer Science

Abstract

The importance of analysing health data collected as part of clinical care and stored in electronic health records is well-established. This has led to vital research about the occurrence and progression of disease, treatment effectiveness and safety, and health service delivery. The current Covid-19 pandemic has demonstrated the public health need to efficiently use data collected at the point of care to rapidly understand patterns, risk factors and outcomes of emerging diseases. Much of this work comes from primary care electronic health records, where general practitioners (GPs) enter and use structured, coded healthcare data. The picture in hospitals, however, is very different.

One in four people in the UK live with one or more long-term conditions like cardiovascular diseases, chronic respiratory diseases, type 2 diabetes, arthritis and cancer, which account for 70% of the NHS budget. Specialised opinion about management of long-term conditions (LTCs) is provided through hospital outpatient care. Data and insight from outpatient clinics, however, is almost entirely absent. There is, surprisingly, no national system for recording diagnoses in hospital outpatient clinics. Information about key clinical events is instead recorded in outpatient letters, which are primarily used to communicate with patients and GPs. The ways in which letters are written and their sensitive content mean that they are not available for larger-scale "secondary use", i.e. to support clinical practice, research or service improvement. For example, shielding for the current pandemic relied on hospital clinical teams going through patient letters manually to identify those who needed shielding based on free-text information about diagnoses and medications, with clear time constraints and risks to under- and over-shield patients.

Natural language processing (NLP) and text mining develop computer algorithms to automatically extract relevant information from free-text documents. This project will establish a partnership between academia, secondary care and industry to develop a standards-based information management framework to safely unlock information stored in outpatient letters, link it with other health data and demonstrate its impact and benefits through two case studies. We will develop new methods to extract key clinical events from letters and represent their details (e.g. medication used, duration of symptoms) in a computerised form so that it can be easily accessed. In doing so, we will use the NHS-adopted standards so that the outpatient letters can be linked to other hospital databases and do not live in their own silo. The protection of sensitive data that potentially appear in outpatient data is a prime concern, so we will develop clear rules on who and how can access such data, in particular considering that third parties (e.g. industry) may need to access that data for developing their tools. These rules will be developed in a close collaboration between patient representatives, clinicians and specialists to ensure safeguards, public trust and transparency of decision making.

We will demonstrate the potential impact of the proposed methods through two case studies with our clinical and business partners. Our first case study will demonstrate how the proposed models can assist in timely, efficient, dynamic and transparent identification of patients for shielding in a pandemic, or for vaccination prioritisation. In the second case study, we will illustrate how the same information can be used address important gaps in our knowledge about health and care, including, for example, disease prevalence and drug utilisation patterns. All outputs will be developed in a way that can be scaled beyond the single clinical site and single speciality.

Publications

10 25 50
 
Description Configurable federated de-identification of clinical free-text data to unlock the research potential of unstructured patient data to improve health and treatment outcomes
Amount £13,000 (GBP)
Organisation University of Manchester 
Sector Academic/University
Country United Kingdom
Start 05/2022 
End 09/2022
 
Title drugprepr: Prepare Electronic Prescription Record Data to Estimate Drug Exposure 
Description Prepare prescription data (such as from the Clinical Practice Research Datalink) into an analysis-ready format, with start and stop dates for each patient's prescriptions. 
Type Of Material Improvements to research infrastructure 
Year Produced 2021 
Provided To Others? Yes  
Impact Used to prepare drug exposure data in the Centre for Epidemiology. 
URL https://cran.r-project.org/web/packages/drugprepr/index.html
 
Title MASK - de-identification of clinical narrative 
Description Medical health records and clinical summaries contain a vast amount of important information in textual form that can help advancing research on treatments, drugs and public health. However, the majority of these information is not shared because they contain private information about patients, their families, or medical staff treating them. Regulations such as HIPPA in the US, PHIPPA in Canada and GDPR regulate the protection, processing and distribution of this information. In case this information is de-identified and personal information are replaced or redacted, they could be distributed to the research community. In this paper, we present MASK, a software package that is designed to perform the de-identification task. The software is able to perform named entity recognition using some of the state-of-the-art techniques and then mask or redact recognized entities. The user is able to select named entity recognition algorithm (with pre-trained models, including BERT, GLoVe and ELMo embedding) and masking algorithm (e.g. shift dates, replace names/locations, totally redact entity). 
Type Of Technology Software 
Year Produced 2023 
Open Source License? Yes  
Impact Used as part of HIPS and Jigsaw projects. 
 
Description Clinical NLP workshop 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Panel discussion with clinical NLP colleges from Oxford and Sheffield on pre-trained clinical language models, fusion with ontologies and knowledge graphs. Talks by Aline Villavicencio and Hang Dong (29/30 November 2022).
Year(s) Of Engagement Activity 2022
 
Description Exploring foundation models 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Participation in an event organised by the Alan Turing Institute: "Exploring foundation models" 22.02.2023
Year(s) Of Engagement Activity 2023
URL https://www.turing.ac.uk/events/exploring-foundation-models
 
Description HealTAC 2022 conference 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact HealTAC 2022 was the fifth UK healthcare text analytics conference organised by Healtex. It was again a huge success - we had over 100 attendees gathered this time for a 3-day online event. It brought the academic, clinical, industrial and patient communities together to discuss the current state of the art in processing healthcare free text and share experience, results and challenges. The conference featured two keynotes from leading experts in healthcare text analytics: Dr Ozlem Uzuner (George Mason University): "Building semantic representations of clinical notes: opportunities, challenges, and progress in natural language processing on electronic health records" and Prof James Teo (King's College Hospital):"Embedding text analytics into real-world clinical systems". There were also several research paper presentations, 20 posters, two panels ('How does PPIE add value in text analytics research?' and 'Text mining in veterinary medicine'), an industry forum ('How can NLP enable personalised medicine?') with several demo sessions for various software solutions from industry and NHS. Two tutorials ('Patient and Public Involvement and Engagement (PPIE): Hands on Guidance for Clinical Text Analytics' and 'De-identification of clinical and medical texts using MASK and MedCAT') were organised. We also had a PhD and Early career forum where five early career researchers presenting their projects and receiving feedback from an expert panel and the audience. HealTAC is now an annual community event.
Year(s) Of Engagement Activity 2022
URL https://healtac2022.github.io/
 
Description HealTAC conference poster 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact The accurate identification of diagnoses in free clinical narratives is decisive for characterizing the patients in a medical cohort. Thefore, the knowledge extraction and information retrieval tasks must be addressed carefully. Clinical notes might present multiple qualifiers that could change the meaning of a statement: negation, speculation, temporal information, family history and so on. It is not unusual for caregivers to preserve uncertainty using broad and ambiguous terms when they have not full evidence of the disease status of a patient.
Year(s) Of Engagement Activity 2022
URL https://www.researchgate.net/publication/364051372_Diagnosis_Certainty_and_Progression_A_Natural_Lan...
 
Description Healthcare NLP in industry 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Professional Practitioners
Results and Impact Discussion with NLP companies on how to engage with academia and NHS. DeepCognito and RecourseAI - gave talks. 6 December 2022.
Year(s) Of Engagement Activity 2022
 
Description Meta-Evaluation of Translation Evaluation Methods: a systematic up-to-date overview (LREC tutorial) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Starting from 1950s, Machine Translation (MT) was challenged from different scientific solutions which included rule-based methods, example-based and statistical models (SMT), to hybrid models, and very recent years the neural models (NMT). While NMT has achieved a huge quality improvement in comparison to conventional methodologies, by taking advantages of huge amount of parallel corpora available from internet and the recently developed super computational power support with an acceptable cost, it struggles to achieve real human parity in many domains and most language pairs, if not all of them. Alongside the long road of MT research and development, quality evaluation metrics played very important roles in MT advancement and evolution. In this tutorial, we overview the traditional human judgement criteria, automatic evaluation metrics, unsupervised quality estimation models, as well as the meta-evaluation of the evaluation methods. Among these, we will also cover the very recent work in the MT evaluation (MTE) fields taking advantages of large size of pre-trained language models for automatic metric customisation towards exactly deployed language pairs and domains. In addition, we also introduce the statistical confidence estimation regarding sample size needed for human evaluation in real practice simulation.
Year(s) Of Engagement Activity 2022
 
Description NLP for Mental Health 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact A meeting to discuss how clinical NLP applications in Mental Health could be shared, co-designed and co-developed. Participants from King's College, Cambridge, Manchester and Oxford.
Year(s) Of Engagement Activity 2022
 
Description PPIE Introductory Workshop 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Patients, carers and/or patient groups
Results and Impact An introductory PPIE session with the project's PPIE advisory group, to define and discuss terms of reference, research questions, etc. November 15, 2022.
Year(s) Of Engagement Activity 2022
 
Description PPIE Workshop 1 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Patients, carers and/or patient groups
Results and Impact First in a series of PPIE workshops discussing outpatient letters, their role and challenges. 30 November 2022
Year(s) Of Engagement Activity 2022
 
Description VetText working group 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact VetText working group meeting to discuss the opportunities and challenges of veterinary and clinical NLP. Participants from Manchester and Liverpool. 28 November 2022.
Year(s) Of Engagement Activity 2022