Integrating hospital outpatient letters into the healthcare data space
Lead Research Organisation:
University of Manchester
Department Name: Computer Science
Abstract
The importance of analysing health data collected as part of clinical care and stored in electronic health records is well-established. This has led to vital research about the occurrence and progression of disease, treatment effectiveness and safety, and health service delivery. The current Covid-19 pandemic has demonstrated the public health need to efficiently use data collected at the point of care to rapidly understand patterns, risk factors and outcomes of emerging diseases. Much of this work comes from primary care electronic health records, where general practitioners (GPs) enter and use structured, coded healthcare data. The picture in hospitals, however, is very different.
One in four people in the UK live with one or more long-term conditions like cardiovascular diseases, chronic respiratory diseases, type 2 diabetes, arthritis and cancer, which account for 70% of the NHS budget. Specialised opinion about management of long-term conditions (LTCs) is provided through hospital outpatient care. Data and insight from outpatient clinics, however, is almost entirely absent. There is, surprisingly, no national system for recording diagnoses in hospital outpatient clinics. Information about key clinical events is instead recorded in outpatient letters, which are primarily used to communicate with patients and GPs. The ways in which letters are written and their sensitive content mean that they are not available for larger-scale "secondary use", i.e. to support clinical practice, research or service improvement. For example, shielding for the current pandemic relied on hospital clinical teams going through patient letters manually to identify those who needed shielding based on free-text information about diagnoses and medications, with clear time constraints and risks to under- and over-shield patients.
Natural language processing (NLP) and text mining develop computer algorithms to automatically extract relevant information from free-text documents. This project will establish a partnership between academia, secondary care and industry to develop a standards-based information management framework to safely unlock information stored in outpatient letters, link it with other health data and demonstrate its impact and benefits through two case studies. We will develop new methods to extract key clinical events from letters and represent their details (e.g. medication used, duration of symptoms) in a computerised form so that it can be easily accessed. In doing so, we will use the NHS-adopted standards so that the outpatient letters can be linked to other hospital databases and do not live in their own silo. The protection of sensitive data that potentially appear in outpatient data is a prime concern, so we will develop clear rules on who and how can access such data, in particular considering that third parties (e.g. industry) may need to access that data for developing their tools. These rules will be developed in a close collaboration between patient representatives, clinicians and specialists to ensure safeguards, public trust and transparency of decision making.
We will demonstrate the potential impact of the proposed methods through two case studies with our clinical and business partners. Our first case study will demonstrate how the proposed models can assist in timely, efficient, dynamic and transparent identification of patients for shielding in a pandemic, or for vaccination prioritisation. In the second case study, we will illustrate how the same information can be used address important gaps in our knowledge about health and care, including, for example, disease prevalence and drug utilisation patterns. All outputs will be developed in a way that can be scaled beyond the single clinical site and single speciality.
One in four people in the UK live with one or more long-term conditions like cardiovascular diseases, chronic respiratory diseases, type 2 diabetes, arthritis and cancer, which account for 70% of the NHS budget. Specialised opinion about management of long-term conditions (LTCs) is provided through hospital outpatient care. Data and insight from outpatient clinics, however, is almost entirely absent. There is, surprisingly, no national system for recording diagnoses in hospital outpatient clinics. Information about key clinical events is instead recorded in outpatient letters, which are primarily used to communicate with patients and GPs. The ways in which letters are written and their sensitive content mean that they are not available for larger-scale "secondary use", i.e. to support clinical practice, research or service improvement. For example, shielding for the current pandemic relied on hospital clinical teams going through patient letters manually to identify those who needed shielding based on free-text information about diagnoses and medications, with clear time constraints and risks to under- and over-shield patients.
Natural language processing (NLP) and text mining develop computer algorithms to automatically extract relevant information from free-text documents. This project will establish a partnership between academia, secondary care and industry to develop a standards-based information management framework to safely unlock information stored in outpatient letters, link it with other health data and demonstrate its impact and benefits through two case studies. We will develop new methods to extract key clinical events from letters and represent their details (e.g. medication used, duration of symptoms) in a computerised form so that it can be easily accessed. In doing so, we will use the NHS-adopted standards so that the outpatient letters can be linked to other hospital databases and do not live in their own silo. The protection of sensitive data that potentially appear in outpatient data is a prime concern, so we will develop clear rules on who and how can access such data, in particular considering that third parties (e.g. industry) may need to access that data for developing their tools. These rules will be developed in a close collaboration between patient representatives, clinicians and specialists to ensure safeguards, public trust and transparency of decision making.
We will demonstrate the potential impact of the proposed methods through two case studies with our clinical and business partners. Our first case study will demonstrate how the proposed models can assist in timely, efficient, dynamic and transparent identification of patients for shielding in a pandemic, or for vaccination prioritisation. In the second case study, we will illustrate how the same information can be used address important gaps in our knowledge about health and care, including, for example, disease prevalence and drug utilisation patterns. All outputs will be developed in a way that can be scaled beyond the single clinical site and single speciality.
Publications
Alfattni G
(2021)
Attention-based bidirectional long short-term memory networks for extracting temporal relationships from clinical discharge summaries.
in Journal of biomedical informatics
Han L
(2024)
Neural machine translation of clinical text: an empirical investigation into multilingual pre-trained language models and transfer-learning
in Frontiers in Digital Health
Hassan L
(2022)
Text mining tweets on e-cigarette risks and benefits using machine learning following a vaping related lung injury outbreak in the USA.
in Healthcare analytics (New York, N.Y.)
Jani M
(2023)
"Take up to eight tablets per day": Incorporating free-text medication instructions into a transparent and reproducible process for preparing drug exposure data for pharmacoepidemiology.
in Pharmacoepidemiology and drug safety
Karystianis G
(2022)
An Analysis of PubMed Abstracts From 1946 to 2021 to Identify Organizational Affiliations in Epidemiological Criminology: Descriptive Study.
in Interactive journal of medical research
Karystianis G
(2022)
Mental Illness Concordance Between Hospital Clinical Records and Mentions in Domestic Violence Police Narratives: Data Linkage Study.
in JMIR formative research
Rana H
(2021)
Perceptions of opioid use and impact on quality of life in patients with musculoskeletal conditions within online health community forums.
in Rheumatology advances in practice
Yang X
(2021)
Mining a stroke knowledge graph from literature.
in BMC bioinformatics
Description | Configurable federated de-identification of clinical free-text data to unlock the research potential of unstructured patient data to improve health and treatment outcomes |
Amount | £13,000 (GBP) |
Organisation | University of Manchester |
Sector | Academic/University |
Country | United Kingdom |
Start | 05/2022 |
End | 09/2022 |
Title | drugprepr: Prepare Electronic Prescription Record Data to Estimate Drug Exposure |
Description | Prepare prescription data (such as from the Clinical Practice Research Datalink) into an analysis-ready format, with start and stop dates for each patient's prescriptions. |
Type Of Material | Improvements to research infrastructure |
Year Produced | 2021 |
Provided To Others? | Yes |
Impact | Used to prepare drug exposure data in the Centre for Epidemiology. |
URL | https://cran.r-project.org/web/packages/drugprepr/index.html |
Title | MASK - de-identification of clinical narrative |
Description | Medical health records and clinical summaries contain a vast amount of important information in textual form that can help advancing research on treatments, drugs and public health. However, the majority of these information is not shared because they contain private information about patients, their families, or medical staff treating them. Regulations such as HIPPA in the US, PHIPPA in Canada and GDPR regulate the protection, processing and distribution of this information. In case this information is de-identified and personal information are replaced or redacted, they could be distributed to the research community. In this paper, we present MASK, a software package that is designed to perform the de-identification task. The software is able to perform named entity recognition using some of the state-of-the-art techniques and then mask or redact recognized entities. The user is able to select named entity recognition algorithm (with pre-trained models, including BERT, GLoVe and ELMo embedding) and masking algorithm (e.g. shift dates, replace names/locations, totally redact entity). |
Type Of Technology | Software |
Year Produced | 2023 |
Open Source License? | Yes |
Impact | Used as part of HIPS and Jigsaw projects. |
Description | Clinical NLP workshop |
Form Of Engagement Activity | A formal working group, expert panel or dialogue |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Professional Practitioners |
Results and Impact | Panel discussion with clinical NLP colleges from Oxford and Sheffield on pre-trained clinical language models, fusion with ontologies and knowledge graphs. Talks by Aline Villavicencio and Hang Dong (29/30 November 2022). |
Year(s) Of Engagement Activity | 2022 |
Description | Exploring foundation models |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Professional Practitioners |
Results and Impact | Participation in an event organised by the Alan Turing Institute: "Exploring foundation models" 22.02.2023 |
Year(s) Of Engagement Activity | 2023 |
URL | https://www.turing.ac.uk/events/exploring-foundation-models |
Description | HealTAC 2022 conference |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | HealTAC 2022 was the fifth UK healthcare text analytics conference organised by Healtex. It was again a huge success - we had over 100 attendees gathered this time for a 3-day online event. It brought the academic, clinical, industrial and patient communities together to discuss the current state of the art in processing healthcare free text and share experience, results and challenges. The conference featured two keynotes from leading experts in healthcare text analytics: Dr Ozlem Uzuner (George Mason University): "Building semantic representations of clinical notes: opportunities, challenges, and progress in natural language processing on electronic health records" and Prof James Teo (King's College Hospital):"Embedding text analytics into real-world clinical systems". There were also several research paper presentations, 20 posters, two panels ('How does PPIE add value in text analytics research?' and 'Text mining in veterinary medicine'), an industry forum ('How can NLP enable personalised medicine?') with several demo sessions for various software solutions from industry and NHS. Two tutorials ('Patient and Public Involvement and Engagement (PPIE): Hands on Guidance for Clinical Text Analytics' and 'De-identification of clinical and medical texts using MASK and MedCAT') were organised. We also had a PhD and Early career forum where five early career researchers presenting their projects and receiving feedback from an expert panel and the audience. HealTAC is now an annual community event. |
Year(s) Of Engagement Activity | 2022 |
URL | https://healtac2022.github.io/ |
Description | HealTAC conference poster |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Professional Practitioners |
Results and Impact | The accurate identification of diagnoses in free clinical narratives is decisive for characterizing the patients in a medical cohort. Thefore, the knowledge extraction and information retrieval tasks must be addressed carefully. Clinical notes might present multiple qualifiers that could change the meaning of a statement: negation, speculation, temporal information, family history and so on. It is not unusual for caregivers to preserve uncertainty using broad and ambiguous terms when they have not full evidence of the disease status of a patient. |
Year(s) Of Engagement Activity | 2022 |
URL | https://www.researchgate.net/publication/364051372_Diagnosis_Certainty_and_Progression_A_Natural_Lan... |
Description | Healthcare NLP in industry |
Form Of Engagement Activity | A formal working group, expert panel or dialogue |
Part Of Official Scheme? | No |
Geographic Reach | Regional |
Primary Audience | Professional Practitioners |
Results and Impact | Discussion with NLP companies on how to engage with academia and NHS. DeepCognito and RecourseAI - gave talks. 6 December 2022. |
Year(s) Of Engagement Activity | 2022 |
Description | Meta-Evaluation of Translation Evaluation Methods: a systematic up-to-date overview (LREC tutorial) |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Starting from 1950s, Machine Translation (MT) was challenged from different scientific solutions which included rule-based methods, example-based and statistical models (SMT), to hybrid models, and very recent years the neural models (NMT). While NMT has achieved a huge quality improvement in comparison to conventional methodologies, by taking advantages of huge amount of parallel corpora available from internet and the recently developed super computational power support with an acceptable cost, it struggles to achieve real human parity in many domains and most language pairs, if not all of them. Alongside the long road of MT research and development, quality evaluation metrics played very important roles in MT advancement and evolution. In this tutorial, we overview the traditional human judgement criteria, automatic evaluation metrics, unsupervised quality estimation models, as well as the meta-evaluation of the evaluation methods. Among these, we will also cover the very recent work in the MT evaluation (MTE) fields taking advantages of large size of pre-trained language models for automatic metric customisation towards exactly deployed language pairs and domains. In addition, we also introduce the statistical confidence estimation regarding sample size needed for human evaluation in real practice simulation. |
Year(s) Of Engagement Activity | 2022 |
Description | NLP for Mental Health |
Form Of Engagement Activity | A formal working group, expert panel or dialogue |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Professional Practitioners |
Results and Impact | A meeting to discuss how clinical NLP applications in Mental Health could be shared, co-designed and co-developed. Participants from King's College, Cambridge, Manchester and Oxford. |
Year(s) Of Engagement Activity | 2022 |
Description | PPIE Introductory Workshop |
Form Of Engagement Activity | A formal working group, expert panel or dialogue |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Patients, carers and/or patient groups |
Results and Impact | An introductory PPIE session with the project's PPIE advisory group, to define and discuss terms of reference, research questions, etc. November 15, 2022. |
Year(s) Of Engagement Activity | 2022 |
Description | PPIE Workshop 1 |
Form Of Engagement Activity | A formal working group, expert panel or dialogue |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Patients, carers and/or patient groups |
Results and Impact | First in a series of PPIE workshops discussing outpatient letters, their role and challenges. 30 November 2022 |
Year(s) Of Engagement Activity | 2022 |
Description | VetText working group |
Form Of Engagement Activity | A formal working group, expert panel or dialogue |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Professional Practitioners |
Results and Impact | VetText working group meeting to discuss the opportunities and challenges of veterinary and clinical NLP. Participants from Manchester and Liverpool. 28 November 2022. |
Year(s) Of Engagement Activity | 2022 |