Integrating hospital outpatient letters into the healthcare data space

Lead Research Organisation: University of Manchester

Department Name: Computer Science

Abstract

The importance of analysing health data collected as part of clinical care and stored in electronic health records is well-established. This has led to vital research about the occurrence and progression of disease, treatment effectiveness and safety, and health service delivery. The current Covid-19 pandemic has demonstrated the public health need to efficiently use data collected at the point of care to rapidly understand patterns, risk factors and outcomes of emerging diseases. Much of this work comes from primary care electronic health records, where general practitioners (GPs) enter and use structured, coded healthcare data. The picture in hospitals, however, is very different.

One in four people in the UK live with one or more long-term conditions like cardiovascular diseases, chronic respiratory diseases, type 2 diabetes, arthritis and cancer, which account for 70% of the NHS budget. Specialised opinion about management of long-term conditions (LTCs) is provided through hospital outpatient care. Data and insight from outpatient clinics, however, is almost entirely absent. There is, surprisingly, no national system for recording diagnoses in hospital outpatient clinics. Information about key clinical events is instead recorded in outpatient letters, which are primarily used to communicate with patients and GPs. The ways in which letters are written and their sensitive content mean that they are not available for larger-scale "secondary use", i.e. to support clinical practice, research or service improvement. For example, shielding for the current pandemic relied on hospital clinical teams going through patient letters manually to identify those who needed shielding based on free-text information about diagnoses and medications, with clear time constraints and risks to under- and over-shield patients.

Natural language processing (NLP) and text mining develop computer algorithms to automatically extract relevant information from free-text documents. This project will establish a partnership between academia, secondary care and industry to develop a standards-based information management framework to safely unlock information stored in outpatient letters, link it with other health data and demonstrate its impact and benefits through two case studies. We will develop new methods to extract key clinical events from letters and represent their details (e.g. medication used, duration of symptoms) in a computerised form so that it can be easily accessed. In doing so, we will use the NHS-adopted standards so that the outpatient letters can be linked to other hospital databases and do not live in their own silo. The protection of sensitive data that potentially appear in outpatient data is a prime concern, so we will develop clear rules on who and how can access such data, in particular considering that third parties (e.g. industry) may need to access that data for developing their tools. These rules will be developed in a close collaboration between patient representatives, clinicians and specialists to ensure safeguards, public trust and transparency of decision making.

We will demonstrate the potential impact of the proposed methods through two case studies with our clinical and business partners. Our first case study will demonstrate how the proposed models can assist in timely, efficient, dynamic and transparent identification of patients for shielding in a pandemic, or for vaccination prioritisation. In the second case study, we will illustrate how the same information can be used address important gaps in our knowledge about health and care, including, for example, disease prevalence and drug utilisation patterns. All outputs will be developed in a way that can be scaled beyond the single clinical site and single speciality.

Funded Value:

£767,578

Funded Period:

Oct 21 - Sep 25

Funder:

EPSRC

Project Status:

Active

Project Category:

Research Grant

Project Reference:

EP/V047949/1

Principal Investigator:

Goran Nenadic

Research Subject:

Info. & commun. Technol. (100%)

Research Topic:

Artificial Intelligence (20%)

Computer Graphics & Visual. (40%)

Human Communication in ICT (20%)

Information & Knowledge Mgmt (20%)

Organisations

People	ORCID iD
Goran Nenadic (Principal Investigator)
William Dixon (Co-Investigator)	http://orcid.org/0000-0001-5881-4857
Meghna Jani (Co-Investigator)

Publications

Author Name

Title Publication Date Published

10 25 50

Alfattni G (2021) Attention-based bidirectional long short-term memory networks for extracting temporal relationships from clinical discharge summaries. in Journal of biomedical informatics

Belkadi S (2023) Exploring the Value of Pre-trained Language Models for Clinical Named Entity Recognition

Cui Y (2023) MedTem2.0: Prompt-based Temporal Classification of Treatment Events from Discharge Summaries

Fitzpatrick N (2023) Understanding Views Around the Creation of a Consented, Donated Databank of Clinical Free Text to Develop and Train Natural Language Processing Models for Research: Focus Group Interviews With Stakeholders (Preprint)

Gladkoff S (2023) Predictive Data Analytics with AI: assessing the need for post-editing of MT output by fine-tuning OpenAI LLMs

Griciute B (2023) Topic Modelling of Swedish Newspaper Articles about Coronavirus: a Case Study using Latent Dirichlet Allocation Method

Han L (2024) Neural machine translation of clinical text: an empirical investigation into multilingual pre-trained language models and transfer-learning in Frontiers in Digital Health

Han L (2022) Investigating Massive Multilingual Pre-Trained Machine Translation Models for Clinical Domain via Transfer Learning

Han L (2023) Investigating Massive Multilingual Pre-Trained Machine Translation Models for Clinical Domain via Transfer Learning

Hassan L (2022) Text mining tweets on e-cigarette risks and benefits using machine learning following a vaping related lung injury outbreak in the USA. in Healthcare analytics (New York, N.Y.)

Jani M (2023) POS0371 DEVELOPMENT AND EVALUATION OF A TEXT-ANALYTICS ALGORITHM FOR AUTOMATED APPLICATION OF NATIONAL COVID-19 SHIELDING CRITERIA IN RHEUMATOLOGY PATIENTS

Jani M (2023) "Take up to eight tablets per day": Incorporating free-text medication instructions into a transparent and reproducible process for preparing drug exposure data for pharmacoepidemiology. in Pharmacoepidemiology and drug safety

Karystianis G (2022) An Analysis of PubMed Abstracts From 1946 to 2021 to Identify Organizational Affiliations in Epidemiological Criminology: Descriptive Study. in Interactive journal of medical research

Karystianis G (2022) Mental Illness Concordance Between Hospital Clinical Records and Mentions in Domestic Violence Police Narratives: Data Linkage Study. in JMIR formative research

Li H (2023) Team:PULSAR at ProbSum 2023:PULSAR: Pre-training with Extracted Healthcare Terms for Summarising Patients' Problems and Data Augmentation with Black-box Large Language Models

Meghna Jani (2022) Pandemic Planning using Text Analytics on Hospital Outpatient Letters: a Case Study on Covid-19 Shielding for Rheumatology Patients.

Rana H (2021) Perceptions of opioid use and impact on quality of life in patients with musculoskeletal conditions within online health community forums. in Rheumatology advances in practice

Tu H (2023) Extraction of Medication and Temporal Relation from Clinical Text using Neural Language Models

Yang X (2021) Mining a stroke knowledge graph from literature. in BMC bioinformatics

Further Funding
Research Tools and Methods
Software and Technical Products
Engagement Activities


Description	Configurable federated de-identification of clinical free-text data to unlock the research potential of unstructured patient data to improve health and treatment outcomes
Amount	£13,000 (GBP)
Organisation	University of Manchester
Sector	Academic/University
Country	United Kingdom
Start	05/2022
End	09/2022


Title	drugprepr: Prepare Electronic Prescription Record Data to Estimate Drug Exposure
Description	Prepare prescription data (such as from the Clinical Practice Research Datalink) into an analysis-ready format, with start and stop dates for each patient's prescriptions.
Type Of Material	Improvements to research infrastructure
Year Produced	2021
Provided To Others?	Yes
Impact	Used to prepare drug exposure data in the Centre for Epidemiology.
URL	https://cran.r-project.org/web/packages/drugprepr/index.html


Title	MASK - de-identification of clinical narrative
Description	Medical health records and clinical summaries contain a vast amount of important information in textual form that can help advancing research on treatments, drugs and public health. However, the majority of these information is not shared because they contain private information about patients, their families, or medical staff treating them. Regulations such as HIPPA in the US, PHIPPA in Canada and GDPR regulate the protection, processing and distribution of this information. In case this information is de-identified and personal information are replaced or redacted, they could be distributed to the research community. In this paper, we present MASK, a software package that is designed to perform the de-identification task. The software is able to perform named entity recognition using some of the state-of-the-art techniques and then mask or redact recognized entities. The user is able to select named entity recognition algorithm (with pre-trained models, including BERT, GLoVe and ELMo embedding) and masking algorithm (e.g. shift dates, replace names/locations, totally redact entity).
Type Of Technology	Software
Year Produced	2023
Open Source License?	Yes
Impact	Used as part of HIPS and Jigsaw projects.


Description	Clinical NLP workshop
Form Of Engagement Activity	A formal working group, expert panel or dialogue
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Professional Practitioners
Results and Impact	Panel discussion with clinical NLP colleges from Oxford and Sheffield on pre-trained clinical language models, fusion with ontologies and knowledge graphs. Talks by Aline Villavicencio and Hang Dong (29/30 November 2022).
Year(s) Of Engagement Activity	2022


Description	Exploring foundation models
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Professional Practitioners
Results and Impact	Participation in an event organised by the Alan Turing Institute: "Exploring foundation models" 22.02.2023
Year(s) Of Engagement Activity	2023
URL	https://www.turing.ac.uk/events/exploring-foundation-models


Description	HealTAC 2022 conference
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	HealTAC 2022 was the fifth UK healthcare text analytics conference organised by Healtex. It was again a huge success - we had over 100 attendees gathered this time for a 3-day online event. It brought the academic, clinical, industrial and patient communities together to discuss the current state of the art in processing healthcare free text and share experience, results and challenges. The conference featured two keynotes from leading experts in healthcare text analytics: Dr Ozlem Uzuner (George Mason University): "Building semantic representations of clinical notes: opportunities, challenges, and progress in natural language processing on electronic health records" and Prof James Teo (King's College Hospital):"Embedding text analytics into real-world clinical systems". There were also several research paper presentations, 20 posters, two panels ('How does PPIE add value in text analytics research?' and 'Text mining in veterinary medicine'), an industry forum ('How can NLP enable personalised medicine?') with several demo sessions for various software solutions from industry and NHS. Two tutorials ('Patient and Public Involvement and Engagement (PPIE): Hands on Guidance for Clinical Text Analytics' and 'De-identification of clinical and medical texts using MASK and MedCAT') were organised. We also had a PhD and Early career forum where five early career researchers presenting their projects and receiving feedback from an expert panel and the audience. HealTAC is now an annual community event.
Year(s) Of Engagement Activity	2022
URL	https://healtac2022.github.io/


Description	HealTAC conference poster
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Professional Practitioners
Results and Impact	The accurate identification of diagnoses in free clinical narratives is decisive for characterizing the patients in a medical cohort. Thefore, the knowledge extraction and information retrieval tasks must be addressed carefully. Clinical notes might present multiple qualifiers that could change the meaning of a statement: negation, speculation, temporal information, family history and so on. It is not unusual for caregivers to preserve uncertainty using broad and ambiguous terms when they have not full evidence of the disease status of a patient.
Year(s) Of Engagement Activity	2022
URL	https://www.researchgate.net/publication/364051372_Diagnosis_Certainty_and_Progression_A_Natural_Lan...


Description	Healthcare NLP in industry
Form Of Engagement Activity	A formal working group, expert panel or dialogue
Part Of Official Scheme?	No
Geographic Reach	Regional
Primary Audience	Professional Practitioners
Results and Impact	Discussion with NLP companies on how to engage with academia and NHS. DeepCognito and RecourseAI - gave talks. 6 December 2022.
Year(s) Of Engagement Activity	2022


Description	Meta-Evaluation of Translation Evaluation Methods: a systematic up-to-date overview (LREC tutorial)
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Starting from 1950s, Machine Translation (MT) was challenged from different scientific solutions which included rule-based methods, example-based and statistical models (SMT), to hybrid models, and very recent years the neural models (NMT). While NMT has achieved a huge quality improvement in comparison to conventional methodologies, by taking advantages of huge amount of parallel corpora available from internet and the recently developed super computational power support with an acceptable cost, it struggles to achieve real human parity in many domains and most language pairs, if not all of them. Alongside the long road of MT research and development, quality evaluation metrics played very important roles in MT advancement and evolution. In this tutorial, we overview the traditional human judgement criteria, automatic evaluation metrics, unsupervised quality estimation models, as well as the meta-evaluation of the evaluation methods. Among these, we will also cover the very recent work in the MT evaluation (MTE) fields taking advantages of large size of pre-trained language models for automatic metric customisation towards exactly deployed language pairs and domains. In addition, we also introduce the statistical confidence estimation regarding sample size needed for human evaluation in real practice simulation.
Year(s) Of Engagement Activity	2022


Description	NLP for Mental Health
Form Of Engagement Activity	A formal working group, expert panel or dialogue
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Professional Practitioners
Results and Impact	A meeting to discuss how clinical NLP applications in Mental Health could be shared, co-designed and co-developed. Participants from King's College, Cambridge, Manchester and Oxford.
Year(s) Of Engagement Activity	2022


Description	PPIE Introductory Workshop
Form Of Engagement Activity	A formal working group, expert panel or dialogue
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Patients, carers and/or patient groups
Results and Impact	An introductory PPIE session with the project's PPIE advisory group, to define and discuss terms of reference, research questions, etc. November 15, 2022.
Year(s) Of Engagement Activity	2022


Description	PPIE Workshop 1
Form Of Engagement Activity	A formal working group, expert panel or dialogue
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Patients, carers and/or patient groups
Results and Impact	First in a series of PPIE workshops discussing outpatient letters, their role and challenges. 30 November 2022
Year(s) Of Engagement Activity	2022


Description	VetText working group
Form Of Engagement Activity	A formal working group, expert panel or dialogue
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Professional Practitioners
Results and Impact	VetText working group meeting to discuss the opportunities and challenges of veterinary and clinical NLP. Participants from Manchester and Liverpool. 28 November 2022.
Year(s) Of Engagement Activity	2022

Abstract

Organisations

People

ORCID iD

Publications