SIEPH: Safe Information Extraction from Patient Histories

Lead Research Organisation: University of Glasgow
Department Name: School of Computing Science

Abstract

Medical records are an important resource for making discoveries about human health. Patterns of symptoms, prescriptions and other clinical events can help researchers understand why different patients respond differently to drugs and lead to new understandings about disease. Substantial information about a patient's medical history is recorded in written clinical notes. Unfortunately, these clinical notes frequently cannot be used by medical researchers because they may contain sensitive personal information. This directly limits the applicability of natural language processing (NLP) methods to use computers to automatically read the notes and extract them. This project proposes to build new methods that will safely extract important information from clinical notes that are needed by medical researchers to answer complex medical questions that could lead to new discoveries. To achieve this, the project will develop a novel method using the concept of synthetic records that are artificially generated medical records that resemble real records in structure and content but do not contain any sensitive information. These synthetic records can then be provided to medical researchers who can annotate the exact type of information that they want to pull from real medical records. These annotations can be used to build a machine learning system to extract the specific type of information from real medical records. The resulting data will be further scanned to ensure that no sensitive information is leaked to researchers thereby providing them with the medical data they need to make medical discoveries but not endangering patient privacy. The resulting technologies will enable medical researchers to ask new complex questions of medical records where the information they need is locked in written clinical notes. We will work with the NHS Safe Havens team to evaluate this approach so that it may aid medical researchers and the NHS in the future.

Publications

10 25 50