📣 Help Shape the Future of UKRI's Gateway to Research (GtR)

We're improving UKRI's Gateway to Research and are seeking your input! If you would be interested in being interviewed about the improvements we're making and to have your say about how we can make GtR more user-friendly, impactful, and effective for the Research and Innovation community, please email gateway@ukri.org.

Context Aware Augmented Reality for Endonasal Endoscopic Surgery

Lead Research Organisation: UNIVERSITY COLLEGE LONDON
Department Name: Medical Physics and Biomedical Eng

Abstract

This project aims to develop tools to guide a surgeon during surgery to remove cancers on the pituitary gland.

Access to the pituitary gland is difficult, and one current approach is the endonasal approach, through the nose. However, while this approach is minimally invasive which is better for the patient, it is technically challenging for the surgeon. It is difficult for the surgeon to manoeuvre the tools, but also difficult for the surgeon to maintain contextual awareness and remember the location of and identify critical structures.

One proposed solution is to combine pre-operative scan data, such as information from Magnetic Resonance Imaging (MRI), or Computed Tomography (CT) scans, and use them in conjunction with the video. Typically, engineers have proposed "Augmented Reality", where the information from MRI/CT scans is simply overlaid on top of the endoscopic video. But this approach has not found favour with clinical teams, and the result is often confusing and difficult to use.

In this project we have assembled a team of surgeons and engineers to re-think the Augmented Reality paradigm from the ground up. First, the aim is to identify the most relevant information to display on-screen at each stage of the operation. Then machine learning will be used to analyse the endoscopic video, and automatically identify which stage of the procedure the surgeon is working on. The guidance system will then automatically switch modes, and provide the most useful information for each stage of the procedure. Finally, we will automate the alignment of pre-operative data to the endoscopic video, using machine learning techniques.

The end result should be more accurate, and more clinically relevant than the current state of the art methods, and represent a genuine step change in performance for image-guidance during skull-base procedures.

Publications

10 25 50

publication icon
Xu M (2024) Privacy-Preserving Synthetic Continual Semantic Segmentation for Robotic Surgery in IEEE Transactions on Medical Imaging

 
Title Endoscopic Pituitary Surgery on a High-fidelity Bench-top Phantom 
Description The first public dataset containing both instrument and surgical skill assessment annotations in a high-fidelity bench-top phantom (www.store.upsurgeon.com/products/tnsbox/) of the nasal phase of the endoscopic TransSphenoidal Approach (eTSA). The dataset includes 15-videos ({video_number}.mp4), the corresponding mOSATS with level of surgical expertise (mOSATS.csv), and instrument segmentation annotations (annotations.csv). The companion paper with baseline results is titled: "Automated Surgical Skill Assessment in Endoscopic Pituitary Surgery using Real-time Instrument Tracking on a High-fidelity Bench-top Phantom" (Adrito Das et al, in-press). Please cite this paper if you have used this dataset. 
Type Of Material Database/Collection of data 
Year Produced 2024 
Provided To Others? Yes  
URL https://rdr.ucl.ac.uk/articles/dataset/Endoscopic_Pituitary_Surgery_on_a_High-fidelity_Bench-top_Pha...
 
Title Endoscopic Pituitary Surgery on a High-fidelity Bench-top Phantom 
Description The first public dataset containing both instrument and surgical skill assessment annotations in a high-fidelity bench-top phantom (www.store.upsurgeon.com/products/tnsbox/) of the nasal phase of the endoscopic TransSphenoidal Approach (eTSA). The dataset includes 15-videos ({video_number}.mp4), the corresponding mOSATS with level of surgical expertise (mOSATS.csv), and instrument segmentation annotations (annotations.csv). The companion paper with baseline results is titled: "Automated Surgical Skill Assessment in Endoscopic Pituitary Surgery using Real-time Instrument Tracking on a High-fidelity Bench-top Phantom" (Adrito Das et al, in-press). Please cite this paper if you have used this dataset. 
Type Of Material Database/Collection of data 
Year Produced 2024 
Provided To Others? Yes  
URL https://rdr.ucl.ac.uk/articles/dataset/Endoscopic_Pituitary_Surgery_on_a_High-fidelity_Bench-top_Pha...
 
Title PitVQA: A Dataset of Visual Question Answering in Pituitary Surgery 
Description Visual Question Answering (VQA) within the surgical domain, utilising Large Language Models (LLMs), offers a distinct opportunity to improve intra-operative decision-making and facilitate intuitive surgeon-AI interaction. However, the development of LLMs for surgical VQA is hindered by the scarcity of diverse and extensive datasets with complex reasoning tasks. Moreover, contextual fusion of the image and text modalities remains an open research challenge due to the inherent differences between these two types of information and the complexity involved in aligning them. This paper introduces PitVQA, a novel dataset specifically designed for VQA in endonasal pituitary surgery and PitVQA-Net, an adaptation of the GPT2 with a novel image-grounded text embedding for surgical VQA. PitVQA comprises 25 procedural videos and a rich collection of question-answer pairs spanning crucial surgical aspects such as phase and step recognition, context understanding, tool detection and localization, and tool-tissue interactions. PitVQA-Net consists of a novel image-grounded text embedding that projects image and text features into a shared embedding space and GPT2 Backbone with an excitation block classification head to generate contextually relevant answers within the complex domain of endonasal pituitary surgery. Our image-grounded text embedding leverages joint embedding, cross-attention and contextual representation to understand the contextual relationship between questions and surgical images. We demonstrate the effectiveness of PitVQA-Net on both the PitVQA and the publicly available EndoVis18-VQA dataset, achieving improvements in balanced accuracy of 8% and 9% over the most recent baselines, respectively. Our PitVQA dataset comprises 25 videos of endoscopic pituitary surgeries from the National Hospital of Neurology and Neurosurgery in London, United Kingdom, similar to the dataset used in the MICCAI PitVis challenge. All patients provided informed consent, and the study was registered with the local governance committee. The surgeries were recorded using a high-definition endoscope (Karl Storz Endoscopy) with a resolution of 720p and stored as MP4 files. All videos were annotated for the surgical phases, steps, instruments present and operation notes guided by a standardised annotation framework, which was derived from a preceding international consensus study on pituitary surgery workflow. Annotation was performed collaboratively by 2 neurosurgical residents with operative pituitary experience and checked by an attending neurosurgeon. We extracted image frames from each video at 1 fps and removed any frames that were blurred or occluded. Ultimately, we obtained a total of 109,173 frames, with the videos of minimum and maximum length yielding 2,443 and 7,179 frames, respectively. We acquired frame-wise question-answer pairs for all the categories of the annotation. Overall, there are 884,242 question-answer pairs from 109,173 frames, which is around 8 pairs for each frame. There are 59 classes overall, including 4 phases, 15 steps, 18 instruments, 3 variations of instruments present in a frame, 5 positions of the instruments, and 14 operation notes in the annotation classes. The length of the questions ranges from a minimum of 7 words to a maximum of 12 words. 
Type Of Material Database/Collection of data 
Year Produced 2024 
Provided To Others? Yes  
URL https://rdr.ucl.ac.uk/articles/dataset/PitVQA_A_Dataset_of_Visual_Question_Answering_in_Pituitary_Su...
 
Title PitVQA: A Dataset of Visual Question Answering in Pituitary Surgery 
Description Visual Question Answering (VQA) within the surgical domain, utilising Large Language Models (LLMs), offers a distinct opportunity to improve intra-operative decision-making and facilitate intuitive surgeon-AI interaction. However, the development of LLMs for surgical VQA is hindered by the scarcity of diverse and extensive datasets with complex reasoning tasks. Moreover, contextual fusion of the image and text modalities remains an open research challenge due to the inherent differences between these two types of information and the complexity involved in aligning them. This paper introduces PitVQA, a novel dataset specifically designed for VQA in endonasal pituitary surgery and PitVQA-Net, an adaptation of the GPT2 with a novel image-grounded text embedding for surgical VQA. PitVQA comprises 25 procedural videos and a rich collection of question-answer pairs spanning crucial surgical aspects such as phase and step recognition, context understanding, tool detection and localization, and tool-tissue interactions. PitVQA-Net consists of a novel image-grounded text embedding that projects image and text features into a shared embedding space and GPT2 Backbone with an excitation block classification head to generate contextually relevant answers within the complex domain of endonasal pituitary surgery. Our image-grounded text embedding leverages joint embedding, cross-attention and contextual representation to understand the contextual relationship between questions and surgical images. We demonstrate the effectiveness of PitVQA-Net on both the PitVQA and the publicly available EndoVis18-VQA dataset, achieving improvements in balanced accuracy of 8% and 9% over the most recent baselines, respectively. Our PitVQA dataset comprises 25 videos of endoscopic pituitary surgeries from the National Hospital of Neurology and Neurosurgery in London, United Kingdom, similar to the dataset used in the MICCAI PitVis challenge. All patients provided informed consent, and the study was registered with the local governance committee. The surgeries were recorded using a high-definition endoscope (Karl Storz Endoscopy) with a resolution of 720p and stored as MP4 files. All videos were annotated for the surgical phases, steps, instruments present and operation notes guided by a standardised annotation framework, which was derived from a preceding international consensus study on pituitary surgery workflow. Annotation was performed collaboratively by 2 neurosurgical residents with operative pituitary experience and checked by an attending neurosurgeon. We extracted image frames from each video at 1 fps and removed any frames that were blurred or occluded. Ultimately, we obtained a total of 109,173 frames, with the videos of minimum and maximum length yielding 2,443 and 7,179 frames, respectively. We acquired frame-wise question-answer pairs for all the categories of the annotation. Overall, there are 884,242 question-answer pairs from 109,173 frames, which is around 8 pairs for each frame. There are 59 classes overall, including 4 phases, 15 steps, 18 instruments, 3 variations of instruments present in a frame, 5 positions of the instruments, and 14 operation notes in the annotation classes. The length of the questions ranges from a minimum of 7 words to a maximum of 12 words. 
Type Of Material Database/Collection of data 
Year Produced 2024 
Provided To Others? Yes  
URL https://rdr.ucl.ac.uk/articles/dataset/PitVQA_A_Dataset_of_Visual_Question_Answering_in_Pituitary_Su...
 
Title PitVis Challenge: Endoscopic Pituitary Surgery videos 
Description The first public dataset containing both step and instrument annotations of the endoscopic TransSphenoidal Approach (eTSA). The dataset includes 25-videos (video_{video_number}.mp4) and the corresponding step and instrument annotation (annotations_{video_number}.csv). Annotation metadata mapping the numerical value to its formal description is provided (map_steps.csv and map_instrument.csv), as well as video medadata (video_encoder_details.txt). Helpful scripts and baseline models can be found on: https://github.com/dreets/pitvis. This dataset is released as part of the PitVis Challenge, a sub-challenge of the EndoVis Challenge hosted at the annual MICCAI conference (Vancouver, Canada on 06-Oct-2024). More details about the challenge can be found on the challenge website: https://www.synapse.org/Synapse:syn51232283/wiki/621581. The companion paper with comparative models is titled: "PitVis Challenge: Workflow Recognition in videos of Endoscopic Pituitary Surgery" (Adrito Das et al, in-press). Please cite this paper if you have used this dataset. 
Type Of Material Database/Collection of data 
Year Produced 2024 
Provided To Others? Yes  
URL https://rdr.ucl.ac.uk/articles/dataset/PitVis_Challenge_Endoscopic_Pituitary_Surgery_videos/26531686...
 
Title PitVis-2023 Challenge: Endoscopic Pituitary Surgery videos 
Description The first public dataset containing both step and instrument annotations of the endoscopic TransSphenoidal Approach (eTSA). The dataset includes 25-videos (video_{video_number}.mp4) and the corresponding step and instrument annotation (annotations_{video_number}.csv). Annotation metadata mapping the numerical value to its formal description is provided (map_steps.csv and map_instrument.csv), as well as video medadata (video_encoder_details.txt). Helpful scripts and baseline models can be found on: https://github.com/dreets/pitvis. This dataset is released as part of the PitVis Challenge, a sub-challenge of the EndoVis Challenge hosted at the annual MICCAI conference (Vancouver, Canada on 06-Oct-2024). More details about the challenge can be found on the challenge website: https://www.synapse.org/Synapse:syn51232283/wiki/621581. The companion paper with comparative models is titled: "PitVis-2023 Challenge: Workflow Recognition in videos of Endoscopic Pituitary Surgery" (Adrito Das et al.). Please cite this paper if you have used this dataset: https://arxiv.org/abs/2409.01184. 
Type Of Material Database/Collection of data 
Year Produced 2024 
Provided To Others? Yes  
URL https://rdr.ucl.ac.uk/articles/dataset/PitVis_Challenge_Endoscopic_Pituitary_Surgery_videos/26531686
 
Title PitVis-2023 Challenge: Endoscopic Pituitary Surgery videos 
Description The first public dataset containing both step and instrument annotations of the endoscopic TransSphenoidal Approach (eTSA). The dataset includes 25-videos (video_{video_number}.mp4) and the corresponding step and instrument annotation (annotations_{video_number}.csv). Annotation metadata mapping the numerical value to its formal description is provided (map_steps.csv and map_instrument.csv), as well as video medadata (video_encoder_details.txt). Helpful scripts and baseline models can be found on: https://github.com/dreets/pitvis. This dataset is released as part of the PitVis Challenge, a sub-challenge of the EndoVis Challenge hosted at the annual MICCAI conference (Vancouver, Canada on 06-Oct-2024). More details about the challenge can be found on the challenge website: https://www.synapse.org/Synapse:syn51232283/wiki/621581. The companion paper with comparative models is titled: "PitVis-2023 Challenge: Workflow Recognition in videos of Endoscopic Pituitary Surgery" (Adrito Das et al.). Please cite this paper if you have used this dataset: https://arxiv.org/abs/2409.01184. 
Type Of Material Database/Collection of data 
Year Produced 2024 
Provided To Others? Yes  
URL https://rdr.ucl.ac.uk/articles/dataset/PitVis_Challenge_Endoscopic_Pituitary_Surgery_videos/26531686...
 
Description Science of Surgery Open Day 2022, 2023 
Form Of Engagement Activity Participation in an open day or visit at my research institution
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Public/other audiences
Results and Impact Various teams of researchers presented display stands, illustrating novel, interesting, or fun ideas based around surgery and science. The idea was to spark discussions with the general public. For school children, the aim was also to hopefully inspire them to consider science as an interesting topic.
Year(s) Of Engagement Activity 2022,2023,2024
URL https://www.ucl.ac.uk/interventional-surgical-sciences/science-surgery