Context Aware Augmented Reality for Endonasal Endoscopic Surgery
Lead Research Organisation:
UNIVERSITY COLLEGE LONDON
Department Name: Medical Physics and Biomedical Eng
Abstract
This project aims to develop tools to guide a surgeon during surgery to remove cancers on the pituitary gland.
Access to the pituitary gland is difficult, and one current approach is the endonasal approach, through the nose. However, while this approach is minimally invasive which is better for the patient, it is technically challenging for the surgeon. It is difficult for the surgeon to manoeuvre the tools, but also difficult for the surgeon to maintain contextual awareness and remember the location of and identify critical structures.
One proposed solution is to combine pre-operative scan data, such as information from Magnetic Resonance Imaging (MRI), or Computed Tomography (CT) scans, and use them in conjunction with the video. Typically, engineers have proposed "Augmented Reality", where the information from MRI/CT scans is simply overlaid on top of the endoscopic video. But this approach has not found favour with clinical teams, and the result is often confusing and difficult to use.
In this project we have assembled a team of surgeons and engineers to re-think the Augmented Reality paradigm from the ground up. First, the aim is to identify the most relevant information to display on-screen at each stage of the operation. Then machine learning will be used to analyse the endoscopic video, and automatically identify which stage of the procedure the surgeon is working on. The guidance system will then automatically switch modes, and provide the most useful information for each stage of the procedure. Finally, we will automate the alignment of pre-operative data to the endoscopic video, using machine learning techniques.
The end result should be more accurate, and more clinically relevant than the current state of the art methods, and represent a genuine step change in performance for image-guidance during skull-base procedures.
Access to the pituitary gland is difficult, and one current approach is the endonasal approach, through the nose. However, while this approach is minimally invasive which is better for the patient, it is technically challenging for the surgeon. It is difficult for the surgeon to manoeuvre the tools, but also difficult for the surgeon to maintain contextual awareness and remember the location of and identify critical structures.
One proposed solution is to combine pre-operative scan data, such as information from Magnetic Resonance Imaging (MRI), or Computed Tomography (CT) scans, and use them in conjunction with the video. Typically, engineers have proposed "Augmented Reality", where the information from MRI/CT scans is simply overlaid on top of the endoscopic video. But this approach has not found favour with clinical teams, and the result is often confusing and difficult to use.
In this project we have assembled a team of surgeons and engineers to re-think the Augmented Reality paradigm from the ground up. First, the aim is to identify the most relevant information to display on-screen at each stage of the operation. Then machine learning will be used to analyse the endoscopic video, and automatically identify which stage of the procedure the surgeon is working on. The guidance system will then automatically switch modes, and provide the most useful information for each stage of the procedure. Finally, we will automate the alignment of pre-operative data to the endoscopic video, using machine learning techniques.
The end result should be more accurate, and more clinically relevant than the current state of the art methods, and represent a genuine step change in performance for image-guidance during skull-base procedures.
Organisations
Publications
Enkaoua A
(2023)
Image-guidance in endoscopic pituitary surgery: an in-silico study of errors involved in tracker-based techniques.
in Frontiers in surgery
Valetopoulou A
(2024)
Can artificial intelligence improve medicine's uncomfortable relationship with Maths?
in npj Digital Medicine
Xu M
(2024)
Privacy-Preserving Synthetic Continual Semantic Segmentation for Robotic Surgery
in IEEE Transactions on Medical Imaging
Wijekoon A
(2024)
PitRSDNet: Predicting intra-operative remaining surgery duration in endoscopic pituitary surgery.
in Healthcare technology letters
| Title | Endoscopic Pituitary Surgery on a High-fidelity Bench-top Phantom |
| Description | The first public dataset containing both instrument and surgical skill assessment annotations in a high-fidelity bench-top phantom (www.store.upsurgeon.com/products/tnsbox/) of the nasal phase of the endoscopic TransSphenoidal Approach (eTSA). The dataset includes 15-videos ({video_number}.mp4), the corresponding mOSATS with level of surgical expertise (mOSATS.csv), and instrument segmentation annotations (annotations.csv). The companion paper with baseline results is titled: "Automated Surgical Skill Assessment in Endoscopic Pituitary Surgery using Real-time Instrument Tracking on a High-fidelity Bench-top Phantom" (Adrito Das et al, in-press). Please cite this paper if you have used this dataset. |
| Type Of Material | Database/Collection of data |
| Year Produced | 2024 |
| Provided To Others? | Yes |
| URL | https://rdr.ucl.ac.uk/articles/dataset/Endoscopic_Pituitary_Surgery_on_a_High-fidelity_Bench-top_Pha... |
| Title | Endoscopic Pituitary Surgery on a High-fidelity Bench-top Phantom |
| Description | The first public dataset containing both instrument and surgical skill assessment annotations in a high-fidelity bench-top phantom (www.store.upsurgeon.com/products/tnsbox/) of the nasal phase of the endoscopic TransSphenoidal Approach (eTSA). The dataset includes 15-videos ({video_number}.mp4), the corresponding mOSATS with level of surgical expertise (mOSATS.csv), and instrument segmentation annotations (annotations.csv). The companion paper with baseline results is titled: "Automated Surgical Skill Assessment in Endoscopic Pituitary Surgery using Real-time Instrument Tracking on a High-fidelity Bench-top Phantom" (Adrito Das et al, in-press). Please cite this paper if you have used this dataset. |
| Type Of Material | Database/Collection of data |
| Year Produced | 2024 |
| Provided To Others? | Yes |
| URL | https://rdr.ucl.ac.uk/articles/dataset/Endoscopic_Pituitary_Surgery_on_a_High-fidelity_Bench-top_Pha... |
| Title | PitVQA: A Dataset of Visual Question Answering in Pituitary Surgery |
| Description | Visual Question Answering (VQA) within the surgical domain, utilising Large Language Models (LLMs), offers a distinct opportunity to improve intra-operative decision-making and facilitate intuitive surgeon-AI interaction. However, the development of LLMs for surgical VQA is hindered by the scarcity of diverse and extensive datasets with complex reasoning tasks. Moreover, contextual fusion of the image and text modalities remains an open research challenge due to the inherent differences between these two types of information and the complexity involved in aligning them. This paper introduces PitVQA, a novel dataset specifically designed for VQA in endonasal pituitary surgery and PitVQA-Net, an adaptation of the GPT2 with a novel image-grounded text embedding for surgical VQA. PitVQA comprises 25 procedural videos and a rich collection of question-answer pairs spanning crucial surgical aspects such as phase and step recognition, context understanding, tool detection and localization, and tool-tissue interactions. PitVQA-Net consists of a novel image-grounded text embedding that projects image and text features into a shared embedding space and GPT2 Backbone with an excitation block classification head to generate contextually relevant answers within the complex domain of endonasal pituitary surgery. Our image-grounded text embedding leverages joint embedding, cross-attention and contextual representation to understand the contextual relationship between questions and surgical images. We demonstrate the effectiveness of PitVQA-Net on both the PitVQA and the publicly available EndoVis18-VQA dataset, achieving improvements in balanced accuracy of 8% and 9% over the most recent baselines, respectively. Our PitVQA dataset comprises 25 videos of endoscopic pituitary surgeries from the National Hospital of Neurology and Neurosurgery in London, United Kingdom, similar to the dataset used in the MICCAI PitVis challenge. All patients provided informed consent, and the study was registered with the local governance committee. The surgeries were recorded using a high-definition endoscope (Karl Storz Endoscopy) with a resolution of 720p and stored as MP4 files. All videos were annotated for the surgical phases, steps, instruments present and operation notes guided by a standardised annotation framework, which was derived from a preceding international consensus study on pituitary surgery workflow. Annotation was performed collaboratively by 2 neurosurgical residents with operative pituitary experience and checked by an attending neurosurgeon. We extracted image frames from each video at 1 fps and removed any frames that were blurred or occluded. Ultimately, we obtained a total of 109,173 frames, with the videos of minimum and maximum length yielding 2,443 and 7,179 frames, respectively. We acquired frame-wise question-answer pairs for all the categories of the annotation. Overall, there are 884,242 question-answer pairs from 109,173 frames, which is around 8 pairs for each frame. There are 59 classes overall, including 4 phases, 15 steps, 18 instruments, 3 variations of instruments present in a frame, 5 positions of the instruments, and 14 operation notes in the annotation classes. The length of the questions ranges from a minimum of 7 words to a maximum of 12 words. |
| Type Of Material | Database/Collection of data |
| Year Produced | 2024 |
| Provided To Others? | Yes |
| URL | https://rdr.ucl.ac.uk/articles/dataset/PitVQA_A_Dataset_of_Visual_Question_Answering_in_Pituitary_Su... |
| Title | PitVQA: A Dataset of Visual Question Answering in Pituitary Surgery |
| Description | Visual Question Answering (VQA) within the surgical domain, utilising Large Language Models (LLMs), offers a distinct opportunity to improve intra-operative decision-making and facilitate intuitive surgeon-AI interaction. However, the development of LLMs for surgical VQA is hindered by the scarcity of diverse and extensive datasets with complex reasoning tasks. Moreover, contextual fusion of the image and text modalities remains an open research challenge due to the inherent differences between these two types of information and the complexity involved in aligning them. This paper introduces PitVQA, a novel dataset specifically designed for VQA in endonasal pituitary surgery and PitVQA-Net, an adaptation of the GPT2 with a novel image-grounded text embedding for surgical VQA. PitVQA comprises 25 procedural videos and a rich collection of question-answer pairs spanning crucial surgical aspects such as phase and step recognition, context understanding, tool detection and localization, and tool-tissue interactions. PitVQA-Net consists of a novel image-grounded text embedding that projects image and text features into a shared embedding space and GPT2 Backbone with an excitation block classification head to generate contextually relevant answers within the complex domain of endonasal pituitary surgery. Our image-grounded text embedding leverages joint embedding, cross-attention and contextual representation to understand the contextual relationship between questions and surgical images. We demonstrate the effectiveness of PitVQA-Net on both the PitVQA and the publicly available EndoVis18-VQA dataset, achieving improvements in balanced accuracy of 8% and 9% over the most recent baselines, respectively. Our PitVQA dataset comprises 25 videos of endoscopic pituitary surgeries from the National Hospital of Neurology and Neurosurgery in London, United Kingdom, similar to the dataset used in the MICCAI PitVis challenge. All patients provided informed consent, and the study was registered with the local governance committee. The surgeries were recorded using a high-definition endoscope (Karl Storz Endoscopy) with a resolution of 720p and stored as MP4 files. All videos were annotated for the surgical phases, steps, instruments present and operation notes guided by a standardised annotation framework, which was derived from a preceding international consensus study on pituitary surgery workflow. Annotation was performed collaboratively by 2 neurosurgical residents with operative pituitary experience and checked by an attending neurosurgeon. We extracted image frames from each video at 1 fps and removed any frames that were blurred or occluded. Ultimately, we obtained a total of 109,173 frames, with the videos of minimum and maximum length yielding 2,443 and 7,179 frames, respectively. We acquired frame-wise question-answer pairs for all the categories of the annotation. Overall, there are 884,242 question-answer pairs from 109,173 frames, which is around 8 pairs for each frame. There are 59 classes overall, including 4 phases, 15 steps, 18 instruments, 3 variations of instruments present in a frame, 5 positions of the instruments, and 14 operation notes in the annotation classes. The length of the questions ranges from a minimum of 7 words to a maximum of 12 words. |
| Type Of Material | Database/Collection of data |
| Year Produced | 2024 |
| Provided To Others? | Yes |
| URL | https://rdr.ucl.ac.uk/articles/dataset/PitVQA_A_Dataset_of_Visual_Question_Answering_in_Pituitary_Su... |
| Title | PitVis Challenge: Endoscopic Pituitary Surgery videos |
| Description | The first public dataset containing both step and instrument annotations of the endoscopic TransSphenoidal Approach (eTSA). The dataset includes 25-videos (video_{video_number}.mp4) and the corresponding step and instrument annotation (annotations_{video_number}.csv). Annotation metadata mapping the numerical value to its formal description is provided (map_steps.csv and map_instrument.csv), as well as video medadata (video_encoder_details.txt). Helpful scripts and baseline models can be found on: https://github.com/dreets/pitvis. This dataset is released as part of the PitVis Challenge, a sub-challenge of the EndoVis Challenge hosted at the annual MICCAI conference (Vancouver, Canada on 06-Oct-2024). More details about the challenge can be found on the challenge website: https://www.synapse.org/Synapse:syn51232283/wiki/621581. The companion paper with comparative models is titled: "PitVis Challenge: Workflow Recognition in videos of Endoscopic Pituitary Surgery" (Adrito Das et al, in-press). Please cite this paper if you have used this dataset. |
| Type Of Material | Database/Collection of data |
| Year Produced | 2024 |
| Provided To Others? | Yes |
| URL | https://rdr.ucl.ac.uk/articles/dataset/PitVis_Challenge_Endoscopic_Pituitary_Surgery_videos/26531686... |
| Title | PitVis-2023 Challenge: Endoscopic Pituitary Surgery videos |
| Description | The first public dataset containing both step and instrument annotations of the endoscopic TransSphenoidal Approach (eTSA). The dataset includes 25-videos (video_{video_number}.mp4) and the corresponding step and instrument annotation (annotations_{video_number}.csv). Annotation metadata mapping the numerical value to its formal description is provided (map_steps.csv and map_instrument.csv), as well as video medadata (video_encoder_details.txt). Helpful scripts and baseline models can be found on: https://github.com/dreets/pitvis. This dataset is released as part of the PitVis Challenge, a sub-challenge of the EndoVis Challenge hosted at the annual MICCAI conference (Vancouver, Canada on 06-Oct-2024). More details about the challenge can be found on the challenge website: https://www.synapse.org/Synapse:syn51232283/wiki/621581. The companion paper with comparative models is titled: "PitVis-2023 Challenge: Workflow Recognition in videos of Endoscopic Pituitary Surgery" (Adrito Das et al.). Please cite this paper if you have used this dataset: https://arxiv.org/abs/2409.01184. |
| Type Of Material | Database/Collection of data |
| Year Produced | 2024 |
| Provided To Others? | Yes |
| URL | https://rdr.ucl.ac.uk/articles/dataset/PitVis_Challenge_Endoscopic_Pituitary_Surgery_videos/26531686 |
| Title | PitVis-2023 Challenge: Endoscopic Pituitary Surgery videos |
| Description | The first public dataset containing both step and instrument annotations of the endoscopic TransSphenoidal Approach (eTSA). The dataset includes 25-videos (video_{video_number}.mp4) and the corresponding step and instrument annotation (annotations_{video_number}.csv). Annotation metadata mapping the numerical value to its formal description is provided (map_steps.csv and map_instrument.csv), as well as video medadata (video_encoder_details.txt). Helpful scripts and baseline models can be found on: https://github.com/dreets/pitvis. This dataset is released as part of the PitVis Challenge, a sub-challenge of the EndoVis Challenge hosted at the annual MICCAI conference (Vancouver, Canada on 06-Oct-2024). More details about the challenge can be found on the challenge website: https://www.synapse.org/Synapse:syn51232283/wiki/621581. The companion paper with comparative models is titled: "PitVis-2023 Challenge: Workflow Recognition in videos of Endoscopic Pituitary Surgery" (Adrito Das et al.). Please cite this paper if you have used this dataset: https://arxiv.org/abs/2409.01184. |
| Type Of Material | Database/Collection of data |
| Year Produced | 2024 |
| Provided To Others? | Yes |
| URL | https://rdr.ucl.ac.uk/articles/dataset/PitVis_Challenge_Endoscopic_Pituitary_Surgery_videos/26531686... |
| Description | Science of Surgery Open Day 2022, 2023 |
| Form Of Engagement Activity | Participation in an open day or visit at my research institution |
| Part Of Official Scheme? | No |
| Geographic Reach | Local |
| Primary Audience | Public/other audiences |
| Results and Impact | Various teams of researchers presented display stands, illustrating novel, interesting, or fun ideas based around surgery and science. The idea was to spark discussions with the general public. For school children, the aim was also to hopefully inspire them to consider science as an interesting topic. |
| Year(s) Of Engagement Activity | 2022,2023,2024 |
| URL | https://www.ucl.ac.uk/interventional-surgical-sciences/science-surgery |
