Context Aware Augmented Reality for Endonasal Endoscopic Surgery

Lead Research Organisation: UNIVERSITY COLLEGE LONDON

Department Name: Medical Physics and Biomedical Eng

Abstract

This project aims to develop tools to guide a surgeon during surgery to remove cancers on the pituitary gland.

Access to the pituitary gland is difficult, and one current approach is the endonasal approach, through the nose. However, while this approach is minimally invasive which is better for the patient, it is technically challenging for the surgeon. It is difficult for the surgeon to manoeuvre the tools, but also difficult for the surgeon to maintain contextual awareness and remember the location of and identify critical structures.

One proposed solution is to combine pre-operative scan data, such as information from Magnetic Resonance Imaging (MRI), or Computed Tomography (CT) scans, and use them in conjunction with the video. Typically, engineers have proposed "Augmented Reality", where the information from MRI/CT scans is simply overlaid on top of the endoscopic video. But this approach has not found favour with clinical teams, and the result is often confusing and difficult to use.

In this project we have assembled a team of surgeons and engineers to re-think the Augmented Reality paradigm from the ground up. First, the aim is to identify the most relevant information to display on-screen at each stage of the operation. Then machine learning will be used to analyse the endoscopic video, and automatically identify which stage of the procedure the surgeon is working on. The guidance system will then automatically switch modes, and provide the most useful information for each stage of the procedure. Finally, we will automate the alignment of pre-operative data to the endoscopic video, using machine learning techniques.

The end result should be more accurate, and more clinically relevant than the current state of the art methods, and represent a genuine step change in performance for image-guidance during skull-base procedures.

Funded Value:

£1,109,056

Funded Period:

Apr 22 - Nov 25

Funder:

EPSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

EP/W00805X/1

Principal Investigator:

Matthew Clarkson

Research Subject:

Info. & commun. Technol. (60%)

Medical & health interface (20%)

Tools, technologies & methods (20%)

Research Topic:

Human-Computer Interactions (40%)

Image & Vision Computing (20%)

Med.Instrument.Device& Equip. (20%)

Medical Imaging (20%)

Organisations

UNIVERSITY COLLEGE LONDON (Lead Research Organisation)

People	ORCID iD
Matthew Clarkson (Principal Investigator)	http://orcid.org/0000-0002-5565-1252
Hani Marcus (Co-Investigator)	http://orcid.org/0000-0001-8000-392X
Ann Blandford (Co-Investigator)
Danail Stoyanov (Co-Investigator)	http://orcid.org/0000-0002-0980-3227

Publications

Author Name Title Publication Date Published

|< < 1 2 > >|

10 25 50

Bai L (2023) Surgical-VQLA: Transformer with Gated Vision-Language Embedding for Visual Question Localized-Answering in Robotic Surgery

Das A (2023) Medical Image Computing and Computer Assisted Intervention - MICCAI 2023 - 26th International Conference, Vancouver, BC, Canada, October 8-12, 2023, Proceedings, Part IX

Bai L (2023) Surgical-VQLA:Transformer with Gated Vision-Language Embedding for Visual Question Localized-Answering in Robotic Surgery

Bai L (2023) Medical Image Computing and Computer Assisted Intervention - MICCAI 2023 - 26th International Conference, Vancouver, BC, Canada, October 8-12, 2023, Proceedings, Part IX

Seenivasan L (2023) Medical Image Computing and Computer Assisted Intervention - MICCAI 2023 - 26th International Conference, Vancouver, BC, Canada, October 8-12, 2023, Proceedings, Part IX

Bai L (2023) Medical Image Computing and Computer Assisted Intervention - MICCAI 2023 - 26th International Conference, Vancouver, BC, Canada, October 8-12, 2023, Proceedings, Part IX

Enkaoua A (2023) Image-guidance in endoscopic pituitary surgery: an in-silico study of errors involved in tracker-based techniques. in Frontiers in surgery

Valetopoulou A (2024) Can artificial intelligence improve medicine's uncomfortable relationship with Maths? in npj Digital Medicine

Xu M (2024) Privacy-Preserving Synthetic Continual Semantic Segmentation for Robotic Surgery in IEEE Transactions on Medical Imaging

Wijekoon A (2024) PitRSDNet: Predicting intra-operative remaining surgery duration in endoscopic pituitary surgery. in Healthcare technology letters

Research Databases and Models
Engagement Activities


Title	Endoscopic Pituitary Surgery on a High-fidelity Bench-top Phantom
Description	The first public dataset containing both instrument and surgical skill assessment annotations in a high-fidelity bench-top phantom (www.store.upsurgeon.com/products/tnsbox/) of the nasal phase of the endoscopic TransSphenoidal Approach (eTSA). The dataset includes 15-videos ({video_number}.mp4), the corresponding mOSATS with level of surgical expertise (mOSATS.csv), and instrument segmentation annotations (annotations.csv). The companion paper with baseline results is titled: "Automated Surgical Skill Assessment in Endoscopic Pituitary Surgery using Real-time Instrument Tracking on a High-fidelity Bench-top Phantom" (Adrito Das et al, in-press). Please cite this paper if you have used this dataset.
Type Of Material	Database/Collection of data
Year Produced	2024
Provided To Others?	Yes
URL	https://rdr.ucl.ac.uk/articles/dataset/Endoscopic_Pituitary_Surgery_on_a_High-fidelity_Bench-top_Pha...


Title	Endoscopic Pituitary Surgery on a High-fidelity Bench-top Phantom
Description	The first public dataset containing both instrument and surgical skill assessment annotations in a high-fidelity bench-top phantom (www.store.upsurgeon.com/products/tnsbox/) of the nasal phase of the endoscopic TransSphenoidal Approach (eTSA). The dataset includes 15-videos ({video_number}.mp4), the corresponding mOSATS with level of surgical expertise (mOSATS.csv), and instrument segmentation annotations (annotations.csv). The companion paper with baseline results is titled: "Automated Surgical Skill Assessment in Endoscopic Pituitary Surgery using Real-time Instrument Tracking on a High-fidelity Bench-top Phantom" (Adrito Das et al, in-press). Please cite this paper if you have used this dataset.
Type Of Material	Database/Collection of data
Year Produced	2024
Provided To Others?	Yes
URL	https://rdr.ucl.ac.uk/articles/dataset/Endoscopic_Pituitary_Surgery_on_a_High-fidelity_Bench-top_Pha...


Title	PitVQA: A Dataset of Visual Question Answering in Pituitary Surgery
Description	Visual Question Answering (VQA) within the surgical domain, utilising Large Language Models (LLMs), offers a distinct opportunity to improve intra-operative decision-making and facilitate intuitive surgeon-AI interaction. However, the development of LLMs for surgical VQA is hindered by the scarcity of diverse and extensive datasets with complex reasoning tasks. Moreover, contextual fusion of the image and text modalities remains an open research challenge due to the inherent differences between these two types of information and the complexity involved in aligning them. This paper introduces PitVQA, a novel dataset specifically designed for VQA in endonasal pituitary surgery and PitVQA-Net, an adaptation of the GPT2 with a novel image-grounded text embedding for surgical VQA. PitVQA comprises 25 procedural videos and a rich collection of question-answer pairs spanning crucial surgical aspects such as phase and step recognition, context understanding, tool detection and localization, and tool-tissue interactions. PitVQA-Net consists of a novel image-grounded text embedding that projects image and text features into a shared embedding space and GPT2 Backbone with an excitation block classification head to generate contextually relevant answers within the complex domain of endonasal pituitary surgery. Our image-grounded text embedding leverages joint embedding, cross-attention and contextual representation to understand the contextual relationship between questions and surgical images. We demonstrate the effectiveness of PitVQA-Net on both the PitVQA and the publicly available EndoVis18-VQA dataset, achieving improvements in balanced accuracy of 8% and 9% over the most recent baselines, respectively. Our PitVQA dataset comprises 25 videos of endoscopic pituitary surgeries from the National Hospital of Neurology and Neurosurgery in London, United Kingdom, similar to the dataset used in the MICCAI PitVis challenge. All patients provided informed consent, and the study was registered with the local governance committee. The surgeries were recorded using a high-definition endoscope (Karl Storz Endoscopy) with a resolution of 720p and stored as MP4 files. All videos were annotated for the surgical phases, steps, instruments present and operation notes guided by a standardised annotation framework, which was derived from a preceding international consensus study on pituitary surgery workflow. Annotation was performed collaboratively by 2 neurosurgical residents with operative pituitary experience and checked by an attending neurosurgeon. We extracted image frames from each video at 1 fps and removed any frames that were blurred or occluded. Ultimately, we obtained a total of 109,173 frames, with the videos of minimum and maximum length yielding 2,443 and 7,179 frames, respectively. We acquired frame-wise question-answer pairs for all the categories of the annotation. Overall, there are 884,242 question-answer pairs from 109,173 frames, which is around 8 pairs for each frame. There are 59 classes overall, including 4 phases, 15 steps, 18 instruments, 3 variations of instruments present in a frame, 5 positions of the instruments, and 14 operation notes in the annotation classes. The length of the questions ranges from a minimum of 7 words to a maximum of 12 words.
Type Of Material	Database/Collection of data
Year Produced	2024
Provided To Others?	Yes
URL	https://rdr.ucl.ac.uk/articles/dataset/PitVQA_A_Dataset_of_Visual_Question_Answering_in_Pituitary_Su...


Title	PitVQA: A Dataset of Visual Question Answering in Pituitary Surgery
Description	Visual Question Answering (VQA) within the surgical domain, utilising Large Language Models (LLMs), offers a distinct opportunity to improve intra-operative decision-making and facilitate intuitive surgeon-AI interaction. However, the development of LLMs for surgical VQA is hindered by the scarcity of diverse and extensive datasets with complex reasoning tasks. Moreover, contextual fusion of the image and text modalities remains an open research challenge due to the inherent differences between these two types of information and the complexity involved in aligning them. This paper introduces PitVQA, a novel dataset specifically designed for VQA in endonasal pituitary surgery and PitVQA-Net, an adaptation of the GPT2 with a novel image-grounded text embedding for surgical VQA. PitVQA comprises 25 procedural videos and a rich collection of question-answer pairs spanning crucial surgical aspects such as phase and step recognition, context understanding, tool detection and localization, and tool-tissue interactions. PitVQA-Net consists of a novel image-grounded text embedding that projects image and text features into a shared embedding space and GPT2 Backbone with an excitation block classification head to generate contextually relevant answers within the complex domain of endonasal pituitary surgery. Our image-grounded text embedding leverages joint embedding, cross-attention and contextual representation to understand the contextual relationship between questions and surgical images. We demonstrate the effectiveness of PitVQA-Net on both the PitVQA and the publicly available EndoVis18-VQA dataset, achieving improvements in balanced accuracy of 8% and 9% over the most recent baselines, respectively. Our PitVQA dataset comprises 25 videos of endoscopic pituitary surgeries from the National Hospital of Neurology and Neurosurgery in London, United Kingdom, similar to the dataset used in the MICCAI PitVis challenge. All patients provided informed consent, and the study was registered with the local governance committee. The surgeries were recorded using a high-definition endoscope (Karl Storz Endoscopy) with a resolution of 720p and stored as MP4 files. All videos were annotated for the surgical phases, steps, instruments present and operation notes guided by a standardised annotation framework, which was derived from a preceding international consensus study on pituitary surgery workflow. Annotation was performed collaboratively by 2 neurosurgical residents with operative pituitary experience and checked by an attending neurosurgeon. We extracted image frames from each video at 1 fps and removed any frames that were blurred or occluded. Ultimately, we obtained a total of 109,173 frames, with the videos of minimum and maximum length yielding 2,443 and 7,179 frames, respectively. We acquired frame-wise question-answer pairs for all the categories of the annotation. Overall, there are 884,242 question-answer pairs from 109,173 frames, which is around 8 pairs for each frame. There are 59 classes overall, including 4 phases, 15 steps, 18 instruments, 3 variations of instruments present in a frame, 5 positions of the instruments, and 14 operation notes in the annotation classes. The length of the questions ranges from a minimum of 7 words to a maximum of 12 words.
Type Of Material	Database/Collection of data
Year Produced	2024
Provided To Others?	Yes
URL	https://rdr.ucl.ac.uk/articles/dataset/PitVQA_A_Dataset_of_Visual_Question_Answering_in_Pituitary_Su...


Title	PitVis Challenge: Endoscopic Pituitary Surgery videos
Description	The first public dataset containing both step and instrument annotations of the endoscopic TransSphenoidal Approach (eTSA). The dataset includes 25-videos (video_{video_number}.mp4) and the corresponding step and instrument annotation (annotations_{video_number}.csv). Annotation metadata mapping the numerical value to its formal description is provided (map_steps.csv and map_instrument.csv), as well as video medadata (video_encoder_details.txt). Helpful scripts and baseline models can be found on: https://github.com/dreets/pitvis. This dataset is released as part of the PitVis Challenge, a sub-challenge of the EndoVis Challenge hosted at the annual MICCAI conference (Vancouver, Canada on 06-Oct-2024). More details about the challenge can be found on the challenge website: https://www.synapse.org/Synapse:syn51232283/wiki/621581. The companion paper with comparative models is titled: "PitVis Challenge: Workflow Recognition in videos of Endoscopic Pituitary Surgery" (Adrito Das et al, in-press). Please cite this paper if you have used this dataset.
Type Of Material	Database/Collection of data
Year Produced	2024
Provided To Others?	Yes
URL	https://rdr.ucl.ac.uk/articles/dataset/PitVis_Challenge_Endoscopic_Pituitary_Surgery_videos/26531686...


Title	PitVis-2023 Challenge: Endoscopic Pituitary Surgery videos
Description	The first public dataset containing both step and instrument annotations of the endoscopic TransSphenoidal Approach (eTSA). The dataset includes 25-videos (video_{video_number}.mp4) and the corresponding step and instrument annotation (annotations_{video_number}.csv). Annotation metadata mapping the numerical value to its formal description is provided (map_steps.csv and map_instrument.csv), as well as video medadata (video_encoder_details.txt). Helpful scripts and baseline models can be found on: https://github.com/dreets/pitvis. This dataset is released as part of the PitVis Challenge, a sub-challenge of the EndoVis Challenge hosted at the annual MICCAI conference (Vancouver, Canada on 06-Oct-2024). More details about the challenge can be found on the challenge website: https://www.synapse.org/Synapse:syn51232283/wiki/621581. The companion paper with comparative models is titled: "PitVis-2023 Challenge: Workflow Recognition in videos of Endoscopic Pituitary Surgery" (Adrito Das et al.). Please cite this paper if you have used this dataset: https://arxiv.org/abs/2409.01184.
Type Of Material	Database/Collection of data
Year Produced	2024
Provided To Others?	Yes
URL	https://rdr.ucl.ac.uk/articles/dataset/PitVis_Challenge_Endoscopic_Pituitary_Surgery_videos/26531686


Title	PitVis-2023 Challenge: Endoscopic Pituitary Surgery videos
Description	The first public dataset containing both step and instrument annotations of the endoscopic TransSphenoidal Approach (eTSA). The dataset includes 25-videos (video_{video_number}.mp4) and the corresponding step and instrument annotation (annotations_{video_number}.csv). Annotation metadata mapping the numerical value to its formal description is provided (map_steps.csv and map_instrument.csv), as well as video medadata (video_encoder_details.txt). Helpful scripts and baseline models can be found on: https://github.com/dreets/pitvis. This dataset is released as part of the PitVis Challenge, a sub-challenge of the EndoVis Challenge hosted at the annual MICCAI conference (Vancouver, Canada on 06-Oct-2024). More details about the challenge can be found on the challenge website: https://www.synapse.org/Synapse:syn51232283/wiki/621581. The companion paper with comparative models is titled: "PitVis-2023 Challenge: Workflow Recognition in videos of Endoscopic Pituitary Surgery" (Adrito Das et al.). Please cite this paper if you have used this dataset: https://arxiv.org/abs/2409.01184.
Type Of Material	Database/Collection of data
Year Produced	2024
Provided To Others?	Yes
URL	https://rdr.ucl.ac.uk/articles/dataset/PitVis_Challenge_Endoscopic_Pituitary_Surgery_videos/26531686...


Description	Science of Surgery Open Day 2022, 2023
Form Of Engagement Activity	Participation in an open day or visit at my research institution
Part Of Official Scheme?	No
Geographic Reach	Local
Primary Audience	Public/other audiences
Results and Impact	Various teams of researchers presented display stands, illustrating novel, interesting, or fun ideas based around surgery and science. The idea was to spark discussions with the general public. For school children, the aim was also to hopefully inspire them to consider science as an interesting topic.
Year(s) Of Engagement Activity	2022,2023,2024
URL	https://www.ucl.ac.uk/interventional-surgical-sciences/science-surgery

Abstract

Organisations

People

ORCID iD

Publications