AI-based diagnosis for improving classification of bone and soft tissue tumours across the UK

Lead Research Organisation: University College London
Department Name: Medical Physics and Biomedical Eng

Abstract

Delivery of pathology tissue diagnoses, most of which are cancer, in the current format is unsustainable. Advances in genomic medicine and immune-oncology have shown that the classification of tumours into subtypes allows selection of patients for specific treatments but also spares patients unnecessary toxic and expensive therapies. Still, making such diagnoses has become more time-consuming, involving the selection and interpretation of ancillary tests which requires an ever-growing specialist knowledge for each cancer type. Whilst the need for diagnostic expertise is increasing, there is already a shortfall of 25% of pathologists who are able to report results: this is set to decline.

We propose that the use of AI can ensure that the delivery of tissue diagnoses by pathologists is sustainable and supports delivery of personalised treatments. The benefits of AI in pathology are beginning to be seen, e.g. identification of high-grade areas of prostate cancer shows a reduction in errors and pathologists' time. The development of AI for diagnoses is timely as full adoption of digitised histological images, allowing them to be interrogated by both humans and artificial intelligence (AI), is expected in the UK by 2025.

AI is a data-hungry process; it is unrealistic to provide 100,000s images that are required to train a model. Even the most common cancers (e.g. breast) have multiple subtypes; identification of these is required for selection of patients for personalised treatments. To address this challenge, we propose to develop a novel AI strategy using a relatively small sample size (~1000 images per class). Such a model could be adapted to any cancer type. A multiple-instance learning framework will be developed, using transformers for feature extraction and classification. A tool that flags samples that cannot be confidently classified will be applied thereby alerting the pathologist of potentially unseen diseases. The deep learning model will be strengthened by the injection of pathologists' domain knowledge.

Soft tissue and bone tumours
We will develop the AI model on tumours of soft tissue (muscle, fat, blood vessels, etc.) and bone, an area considered to be one of the most challenging diagnostically. These tumours comprise approximately 100 different subtypes, and represent some of the most common cancers in children and young adults.

We will build on our existing deep learning model of 15 different subtypes trained on 2122 images, which predicts the correct diagnosis in 87% of cases. Selection of confirmatory ancillary tests is then prompted by the algorithm and streamlines the diagnostic pathway.

17,000 images that have already been scanned will be added to the library and allow the rapid development and extension of the classification model. The image library will be linked to clinical outcomes and expanded to 35,000 images during the project. Added to this is the commitment of the established Sarcoma Network of at least 20 pathologists from across all countries in the UK, to provide the additional 20,000 images mentioned above.

Additional benefits
The study and infrastructure will serve as the framework for the continued development of the model which can rapidly be expanded prospectively with the introduction of digital pathology in the NHS and globally. The model can be developed over time in response to new advances.

The image library will be available for training future pathologists, research, validation of other AI algorithms, and contribute to the Sarcoma Genomics England Clinical Interpretation Partnership (GeCIP) offering a valuable resource for future multi-modal multi-omic research.

Working closely with Sarcoma charities, and partners, we will involve and engage patients, their families, and the public, to build trust in the use of AI in health care.

Development of AI models for digitised pathology images can avert the crisis facing this medical specialty.
 
Description We are currently in discussion to bring a commercial aspect to the research done via this award, in-line with the grants requirement. We are also in discussion with UCL Business to protect the IP generated.
First Year Of Impact 2024
Sector Healthcare
Impact Types Economic

 
Title Web Application to deploy easily our AI model 
Description We launched a web application that enables direct use of our AI model for digital pathology without any knowledge about progamming. This platform is aimed at making it easy for clinicians and researchers to access and work with our AI tools for analyzing pathology slides. In line with our commitment to open science, we've also open-sourced the AI model and published detailed documentation on its development and performance. This approach is designed to encourage collaboration and facilitate the model's adoption and improvement by the global scientific and medical communities. We hope to drive engagement of the clinican and that the model can be used in a clinical trial scenario to provide proof for certification. 
Type Of Material Improvements to research infrastructure 
Year Produced 2023 
Provided To Others? Yes  
Impact We are already collaborating with engaged clinician at Sheffield Hospital, the Royal National Orthopaedic Hospital, and at the School of Medicine, Medical Sciences & Nutrition from the University of Aberdeen that are testing our models for sarcoma classification, lymphocites detection and mitoses detection. 
URL http://www.sarcoma-ai.com
 
Title OctoPath: Large Scale Pathology Single-Cell Database from Immuno-Fluorescence cells 
Description Through the course of the UKRI AI for Health Award, we generated multi-millions single-cell labels for AI training of various cells (mitotic cells, epithelial, lymphocytes CD4/CD8, stromal, muscular, endothelial, macrophages). This database of single cells contains the H&E representation, the mask around the nuclei, as well as the label for each cell. It will be of great usage for any future AI algorithms that wants to train on these datasets and produce new AI algorithm. The database is currently compiled and will soon be published. 
Type Of Material Database/Collection of data 
Year Produced 2024 
Provided To Others? Yes  
Impact Our current publication submitted to Nature: Machine Learning come directly from the generation of this database. As it is yet to be made fully public (last validation), no external impacts have yet been recorded. This section will be filled 
 
Title UK-Based Sarcoma WSI Database 
Description Before 2023: Database of 10k scanned sarcoma whole-slide images, of 22 different diagnosis. This database is not yet public as we are waiting for our own publications to come out, as well as for ethical regulatory approval. Since 2023 with the award of the UKRI AI for Health, we have increased our current number of diagnosis to 29 and currently are collecting 35,000 WSI to train our AI model. 
Type Of Material Database/Collection of data 
Year Produced 2023 
Provided To Others? No  
Impact The large scale database has allowed us to significantly improves the results of our AI algorithms and to produce state-of-the-art results that are incorporated into our latest model. This model will be tested against pathologist and, pending positive results, incorporated in clinical practice through industrial partners.