Prostate cancer is a heterogeneous disease, displaying a multitude of genetic alterations, histological patterns and clinical outcomes. This heterogen

Lead Research Organisation: University of Oxford

Abstract

The main aims of this project is to leverage recent advances in Natural Language Processing (NLP) to develop end-to-end clinical support systems which can utilise longitudinal free text documents within Electronic Health Records (EHRs). EHRs will often contain historic records, pertaining to all interactions between a patient and the healthcare service, including freetext documents, such as referral letters and discharge notes. A notable challenge is being able to adequately capturing longitudinal representations of clinical texts. Common state-of-the-art models such as the Bidirectional Encoder Representations from
Transformers (BERT) can only process sequences of 512 tokens (Devlin et al., 2018), but a years worth of clinical text for a single patient can consist of more than 10, 000 tokens. Another more general problem relates to the transparency, interpretability and algorithmic fairness of large language models. Therefore this project aims to develop methods and protocol to enhance these aspects.One proposed approach to representing sequential free-text
builds upon the signature of a path, a non-parametric approach to extracting features from data in the form of tensors(Chevyrev and Kormilitzin, 2016). Loosely speaking, a signature is a collection of statistics about a stream of data that are time invariant, and has universal non-linearity, whereby it is sufficient to capture all possibly nonlinear functions of the original data: allowing a
unique approach to representing complex sequential data. Combining signature techniques with strategies to address the limited ability of attention mechanisms in common transformer based
models, such as spare-attention mechanisms (Zaheer et al., 2020). This hybrid approach should allow efficient computation and representations of patients clinical text history, usable in a number
of relevant downstream tasks.Another approach will embrace a new paradigm shift in NLP research, named prompt-learning. Traditional approaches to
many downstream tasks involved taking a model such as BERT pre-trained on masked language modelling (MLM) and next sentence prediction (NSP) followed by a fine-tuning process on downstream tasks. Prompt-learning instead reconstructs the pretraining to embed the downstream task, encouraging the model to implicitly learn the desired task. The use of prompt-learning in a clinical domain has not been documented yet, thus provides a great opportunity.
The proposed new methodologies will be developed and implemented in consultation with clinicians and will address real clinical use-cases. Specifically, the language models will be trained on a large collection of free-text notes from secondary care UKCRIS database to help triage patients to specialist teams. Other strands will explore the feasibility of identifying patients for clinical trials and identification of self-harm. The feasibility of translation of the developed methodology and models will be tested beyond the scope of mental health under the support provided by the EPSRC CDT in Health Data Science.

This project falls within the EPSRC healthcare technologies research area.

Planned Impact

In the same way that bioinformatics has transformed genomic research and clinical practice, health data science will have a dramatic and lasting impact upon the broader fields of medical research, population health, and healthcare delivery. The beneficiaries of the proposed training programme, and of the research that it delivers and enables, will include academia, industry, healthcare, and the broader UK economy.

Academia: Graduates of the training programme will be well placed to start their post-doctoral careers in leading academic institutions, engaging in high-impact multi-disciplinary research, helping to build training and research capacity, sharing their experience within the wider academic community.

Industry: Partner organisations will benefit from close collaboration with leading researchers, from the joint exploration of research priorities, and from the commercialisation of arising intellectual property. Other organisations will benefit from the availability of highly-qualified graduates with skills in big health data analytics.

Healthcare: Healthcare organisations and patients will benefit from the results of enabled and accelerated health research, leading to new treatments and technologies, and an improved ability to identify and evaluate potential improvements in practice through the analysis of real-world health data.

Economy: The life sciences sector is a key component of the UK economy. The programme will provide partner companies with direct access to leading-edge research. Graduates of the programme will be well-qualified to contribute to economic growth - supporting health research and the development of new products and services - and will be able to inform policy and decision making at organisational, regional, and national levels.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/S02428X/1 01/04/2019 30/09/2027
2432020 Studentship EP/S02428X/1 01/10/2020 30/09/2024 Niall Taylor