Medical Knowledge Graphs: Artificial Intelligence for Cancer Pathways

Lead Research Organisation: University of Oxford

Abstract

Routinely-collected healthcare data provides a rich source of information on the real-life workings of healthcare systems, and retrospective analyses of this data can provide real insights. However, such analysis is often difficult, as the data needed to generate useful knowledge is frequently stored heterogeneously and requires significant amounts of time to curate and integrate.Knowledge Graphs (KGs), graph-structured knowledge bases that link entities using semantic relationships, are being increasingly used in the health data literature to overcome these difficulties and construct large-scale knowledge bases from a variety of sources. The idea of KGs was initially popularised by Google through the Google Knowledge Graph, introduced in 2012, and many KGs have since been adopted in academia and industry. Today, hardly any larger company is not using a KG in some way, and artificial intelligence research around KGs is receiving increased attention.In terms of their impact, KGs are appealing in our setting for several reasons. Graph databases often rely on less rigid schemas than traditional databases, facilitating the integration of heterogeneous data. Additionally, the graph-based structure of the data is often an intuitive method for representing complex contextual information, such as a sequence of clinical events, or interactions between comorbidities. KGs allow the rich domain knowledge present in a setting to be represented and reasoned over. Despite these advantages, relatively little research so far has addressed their potential applications for modelling and greater understanding of patient pathways.There are a number of key challenges in this area, in particular the fact that pathways are often defined at high level in terms of broad recommendations, so formalising them into a machine readable form is a potential issue. The process of preparing and interpreting these pathways is also currently human-intensive, and likely to be inaccurate without clinician input.Importantly, the existing state-of-the-art solutions on pathway modelling:lose relevant structural information on the links between elements of a pathway, often choosing to model a pathway as text, images, or another data representation can be difficult to explain and audit, both to the patient or clinician, which is key to building public trust in such systems do not consider multi-granular representations of pathways, which affects transparency since pre-processing steps that translate different levels of granularity are often invisible to downstream analyses rely on raw data without incorporating incorporate domain knowledge, making it harder for reasoning methods to take advantage of the rich contextual information often available. This work attempts to take the first step in overcoming some of these challenges. We build a novel Knowledge Graph on pathways in a way that:preserves structural information between its elements provides a multi-granular representation that is explainable and auditable, thus allowing clinicians to build trust into such a system takes the first step towards more advanced AI reasoning methods that include both data-based AI methods as well as knowledge-based AI methods allows to include all contextual information available, including raw data, expert domain knowledge, thus making decision based on more complete information. This project falls within the EPSRC Artificial Intelligence and Robotics, Healthcare Technologies, as well as Information and Communication Technology research area.This project is in collaboration with Elsevier.

Planned Impact

In the same way that bioinformatics has transformed genomic research and clinical practice, health data science will have a dramatic and lasting impact upon the broader fields of medical research, population health, and healthcare delivery. The beneficiaries of the proposed training programme, and of the research that it delivers and enables, will include academia, industry, healthcare, and the broader UK economy.

Academia: Graduates of the training programme will be well placed to start their post-doctoral careers in leading academic institutions, engaging in high-impact multi-disciplinary research, helping to build training and research capacity, sharing their experience within the wider academic community.

Industry: Partner organisations will benefit from close collaboration with leading researchers, from the joint exploration of research priorities, and from the commercialisation of arising intellectual property. Other organisations will benefit from the availability of highly-qualified graduates with skills in big health data analytics.

Healthcare: Healthcare organisations and patients will benefit from the results of enabled and accelerated health research, leading to new treatments and technologies, and an improved ability to identify and evaluate potential improvements in practice through the analysis of real-world health data.

Economy: The life sciences sector is a key component of the UK economy. The programme will provide partner companies with direct access to leading-edge research. Graduates of the programme will be well-qualified to contribute to economic growth - supporting health research and the development of new products and services - and will be able to inform policy and decision making at organisational, regional, and national levels.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/S02428X/1 01/04/2019 30/09/2027
2432658 Studentship EP/S02428X/1 01/10/2020 30/09/2024 Owen Dwyer