3 posts tagged with "benchmark for endovascular intervention understanding"

View All Tags

CathAction - A Benchmark for Endovascular Intervention Understanding (Part 3)

1. Tasks and Benchmarks

In this section, we benchmark five tasks, including anticipation, recognition, segmentation, collision detection, and domain adaptation, to demonstrate the usefulness of the CathAction dataset. We then discuss the challenges and opportunities for improvement in each task.

A. Catheterization Anticipation

The anticipation task aims to predict the next catheterization action based on a sequence of frames. We adapt the conventional anticipation task framework in computer vision, introducing two timing parameters: anticipation time (τa\tau_a) and observation time (τo\tau_o). The anticipation time denotes the required duration to recognize an action, while the observation time indicates the length of the video footage to analyze before making a prediction. The objective is to predict the action class ( c_a ) for the frames within the anticipation time τa\tau_a, given the frames during the observation time τo\tau_o.

Network and Training. We leverage state-of-the-art action anticipation methods as baselines: CNN&RNN, RU-LSTM, TempAggRe-Fusion, AFFT, and Trans-SVNet. The future action predictions are supervised using cross-entropy loss with labeled future actions. Following prior works, we set τa=1s\tau_a = 1s and τo=1s\tau_o = 1s. Training was performed on a single Nvidia A100 GPU with a batch size of 64 for 80 epochs, starting with a learning rate of 0.001, reduced by a factor of 10 after epochs 30 to 60. We split approximately 80% of the dataset for training and 20% for testing. Performance metrics include top-1 accuracy, precision, and recall.

BaselineVenuesAccuracyPrecisionRecall
CNNCVPR 201828.9830.1429.76
RNNCVPR 201829.6430.3830.44
RU-LSTMCVPR 201935.0834.2934.77
TempAggReECCV 202034.6435.5634.71
Trans-SVNetIJCARS 202229.0619.6720.28
AFFTWACV 202337.9136.8737.63

Table 1: Catheterization anticipation results on the CathAction dataset. All values are reported in percentages (\%).

Figure 1

Figure 1: Qualitative catheterization prediction results. The predicted and ground truth of the next action are displayed on the right of each sample. The green color shows the correct prediction, and the red color shows the incorrect prediction.

Results. Table 1 shows the catheterization anticipation results of different baselines. This table shows that transformer-based methods show superior performance advantages over CNN or LSTM-based models. Qualitative results are illustrated in Fig 1. We can see that transformer-based models can make more accurate predictions in challenging scenarios, especially when the catheter is moving quickly or when the occlusion is partially obscured.

Discussion. Despite the advancements, existing methods for catheterization anticipation still struggle to achieve high accuracy, revealing areas for future research. The rapid motion of the catheter and guidewire poses significant challenges for this task, and real-time performance is crucial as surgeons require immediate feedback during procedures.

B. Catheterization Recognition

Following the traditional action recognition task in computer vision, in catheterization recognition, given an input video segment, our goal is to predict the action class for that video segment.

Network and Training. We explore state-of-the-art methods in action recognition to benchmark the catheterization recognition task, including TDN, Video Swin Transformer, and BERT Pretraining of Video Transformers (BEVT). Each model is trained using two Nvidia A100 GPUs for 80 epochs, with a mini-batch size of 512. The initial learning rates are set to 0.01 for the spatial stream and 0.001 for the temporal stream, reduced by a factor of 10 at the 20 and 40 epochs. All other parameters are re-used from the baseline methods.

Results. Table 2 show the catheterization recognition results of three baseline methods: TDN, Video Swin Transformer, and BEVT, on the CathAction dataset are summarized in the table below. TDN with ResNet101 achieves the best top-1 accuracy of 62.5% on five classes. Action recognition in endovascular intervention remains challenging due to the similarity in the appearance of catheters and guidewires across different environments, while actions depend on the visual characteristics of the catheters and guidewires.

BaselineVenuesAccuracyPrecisionRecall
TDN-ResNet50CVPR 202158.3459.1257.22
TDN-ResNet101CVPR 202162.5061.8962.77
Video Swin TransformerCVPR 202251.6752.1451.24
BEVTCVPR 202249.2850.2749.92

Table 2: Catheterization recognition results on the CathAction dataset. All values are reported in percentages (\%).

Discussion. Compared to the anticipation task (Table 1), catheterization recognition methods (Table 2) show higher accuracy. However, the overall performance is not yet significant enough for real-world applications. Further research can utilize advanced techniques such as multi-modality learning, combining pre-operative data or synthetic data with transfer learning to improve the results. Additionally, exploring the capabilities of large-scale medical foundation models is an interesting research direction.

C. Catheter and Guidewire Segmentation

Catheter and guidewire segmentation is a well-known task in endovascular interventions. In this task, we aim to segment the catheter and guidewire from the background. Unlike catheterization recognition or anticipation, which take a video as input, this segmentation task only uses the X-ray image as input.

BaselineDice ScoreJaccard IndexmIoUAccuracy
UNet51.6957.5131.1763.26
TransUNet56.5255.9334.1355.61
SwinUNet61.2659.5439.5376.60
SSL56.9556.8740.8772.24
SegViT63.4754.1242.4868.73

Table 3: Segmentation results on the CathAction dataset.

Network and Training. We benchmark U-Net, Trans-UNet, SwinUNet, and SegViT. We follow the default training and testing configurations provided in the published papers. We use the Dice Score, Jaccard Index, mIoU, and Accuracy as the evaluation metrics in the segmentation task.

Results. Table 3 shows the catheter and guidewire segmentation results. This table shows that the transformer-based networks such as TransUNet or SegViT achieve higher accuracy than the traditional UNet. The SegViT that utilizes the vision transformer backbone shows the best performance, however, the increase compared with other methods is not a large margin.

Discussion. In contrast to traditional segmentation tasks in computer vision, which typically involve objects occupying substantial portions of an image, the segmentation of catheters and guidewires presents a considerably greater challenge. These elongated instruments have extremely slender bodies, making their spatial presence in the image less pronounced. Additionally, the unique characteristics of X-ray images can lead to misidentification of catheters or guidewires as blood vessels. Addressing these challenges in future research is imperative to enhance the accuracy of segmentation outcomes.

D. Collision Detection

Detecting the collision of the tip of the catheter or guidewire to the blood vessel wall is an important task in endovascular intervention. We define the collision detection task as an object detection problem. In particular, the tip of the catheter or guidewire of all frames in our dataset is annotated with a bounding box. Each bounding box shares the class of either collision when the tip collides with the blood vessel, or normal when there is no collision with the blood vessel.

Network and Training. We use YOWO, YOWO-Plus, STEP, and HIT. Since the bounding boxes in our ground truth have relatively small sizes, we also explore tiny object detection methods such as Yolov and EFF. The training process starts with a learning rate of 0.0003, which is then decreased by a factor of 10 after 20 epochs, concluding at 80 epochs. We train all methods with a mini-batch size of 4 on an Nvidia A100 GPU. The average precision (AP) and mean average precision (mAP) are used to evaluate the detection results.

BaselineCollisionNormalMeanCollisionNormalMean
APmAP
STEP7.7911.2110.986.9211.299.08
YOWO8.3212.1811.737.4612.289.92
YOWO-Plus8.9212.2311.777.8612.4810.28
HIT9.3712.7412.148.1812.7210.81
Yolov*12.3021.0815.8911.8820.0414.11
EFF*13.7022.1016.9112.1420.7814.88

Table 4: Collision detection results on the CathAction dataset. The symbol (*) denotes tiny object detectors.

Figre 2

Figure 2: Qualitative results for the collision detection task. The first two columns visualize the collision results, the third column visualizes no collision cases, and the last column visualizes a failure case where the tip was not detected.

Results. Table 4 shows the collision detection results. This table indicates that tiny object detectors such as Yolov and EFF achieve higher accuracy compared to other normal object detectors. Furthermore, we observe that the performance of all methods remains relatively low. This highlights the challenges that lie ahead for collision detection in endovascular intervention. Figure 2 shows detection examples where EFF has difficulty detecting collisions between the catheter and the blood vessel.

Discussion. Compared to traditional object detection results on vision datasets, the collision detection results on our dataset are significantly lower, with the top mean AP being only 16.91. The challenges of this task come from two factors. First, the tip of the catheter or guidewire is relatively small in X-ray images. Second, the imbalance between the collision and normal class makes the problem more difficult. Therefore, there is a need to develop special methods to address these difficulties. Future works may rely on attention mechanisms, transformers, or foundation models to develop more sufficient endovascular collision detectors.

E. Domain Adaptation

Our dataset is sourced from two distinct environments: vascular phantom data and animal data. To assess the capacity for learning from phantom data and applying it to real data, we benchmark endovascular interventions under domain adaptation setups. For each task, we train the model on the phantom data and then test it on real animal data. In practice, animal data is similar to human data we capture from humans, and it is much more challenging to perform tasks on real animal or human data.

BaselineVenuesAccuracyPrecisionRecall
RU-LSTMCVPR 201922.9323.9122.57
TempAggReECCV 202017.1618.4118.23
Trans-SVNetIJCARS 202219.0617.6719.58
AFFTWACV 202325.6726.2926.33

Table 5: Catheterization anticipation results under domain adaptation setup. All methods are trained on phantom data and tested on animal data.

Anticipation Adaptation. We use the same methods RU-LSTM, TempAggRe, Trans-SVNet, and AFFT for anticipation adaptation experiments. Table 5 shows the results. Compared with the setup in Table 1, we can see that there is a significant accuracy drop. This highlights the challenges of applying baseline methods in practical real-world scenarios, particularly when dealing with unforeseen situations in catheterization procedures.

BaselineVenuesAccuracyPrecisionRecall
TDN-ResNet50CVPR 202124.1923.1724.56
TDN-ResNet101CVPR 202125.6224.5225.68
Video Swin TransformerCVPR 202228.7927.9828.12
BEVTCVPR 202231.2230.4831.79

Table 6: Catheterization recognition results under domain adaptation setup.

Recognition Adaptation. We repeat the catheterization recognition task under the domain adaptation setup. Table 6 shows the results when all baselines are trained on phantom data and tested on animal data. This table also demonstrates that training under domain adaption setup is very challenging, as compared to Table 2 under normal setting, the accuracy drops approximately 30%30\%.

APmAP
BaselineCollisionNormalMeanCollisionNormalMean
STEP1.532.121.871.091.981.62
YOWO2.124.113.091.973.682.92
YOWO-Plus1.181.431.211.071.261.09
HIT1.311.191.241.061.181.11
Yolov7.318.928.096.287.497.21
EFF8.279.168.197.618.297.88

Table 7: Collision detection results under domain adaptation setup. All methods are trained on phantom data and tested on animal data. The symbol (*) denotes tiny object detectors.

Collision Detection Adaptation. Table 7 shows the results collision detection results under domain adaptation. We can see that under domain adaptation setup, most object detection methods achieve very low accuracy. Therefore, there is an immediate need to improve or design new methods that can detect the collision in real-time for endovascular catheterization procedures.

BaselineDice ScoreJaccard IndexmIoUAccuracy
UNet26.5831.3812.1346.07
TransUNet16.1624.1917.2333.61
SwinUNet17.4138.147.5240.79
SSL26.9132.0418.7242.44
SegViT30.7432.2211.4650.00

Table 8: Domain adaptation segmentation results.

Segmentation Adaptation. Table 8 shows the catheter and guidewire segmentation results when the networks are trained on phantom data and tested on animal data. Similar to other tasks under the domain adaptation setting, we observe a significant accuracy drop in all methods. Overall, SegViT still outperforms other segmentation methods. This shows that the vision transformer backbone may be potentially a good solution for this task.

2. Discussion

We introduce CathAction, a large-scale dataset for endovascular intervention tasks, encompassing annotated ground truth for segmentation, action understanding, and collision detection. While CathAction marks a significant advancement in endovascular interventions, it is important to acknowledge certain limitations. First, despite its comprehensiveness, the dataset may not encompass every possible clinical scenario and could potentially lack representation for rare or outlier cases. Second, our work currently benchmarks vision-based methods, which exhibit insufficient accuracy, and persisting challenges in generalizability and adaptability to real-world scenarios are evident. This is highlighted by the results presented in Section 1 for all catheterization anticipation, recognition, segmentation, and collision detection tasks. Thirdly, we mostly utilize metrics from the vision community to evaluate the results. These metrics may not fully reflect the clinical needs, and the continuous refinement of evaluation metrics and exploration of potential interdependencies among tasks would demand further research.

From our intensive experiments, we see several research directions that benefit from our large-scale datasets: 1. There is an immediate need to develop more advanced methods for catheterization anticipation, recognition, collision detection, and action understanding, especially under domain adaptation setup. Future work can explore the potential of graph neural networks, temporal information, and multimodal or transfer learning to improve the accuracy and reliability of the methods. 2. Currently, we address endovascular intervention tasks independently; future work can combine those tasks and tackle them simultaneously (e.g., the anticipation and collision detection tasks can be jointly trained). This would make the research outputs more useful in clinical practice. 3. Given the fact that CathAction is a large-scale dataset, it can be used to train a foundation model for endovascular interventions or related medical tasks.

3. Conclusion

We introduce CathAction as a large-scale dataset for endovascular intervention research, offering the largest and most comprehensive benchmark to date. With intensive annotated data, CathAction addresses crucial limitations in existing datasets and helps connect computer vision with healthcare tasks. By providing a standardized dataset with public code and public metrics, CathAction promotes transparency, reproducibility, and the collective exploration of different tasks in the field. Our code and dataset are publicly available to encourage further study.

CathAction - A Benchmark for Endovascular Intervention Understanding (Part 2)

1. The CathAction Dataset

This section introduces the CathAction dataset. Specifically, we describe the data collection process and annotation pipeline. We then present statistics regarding different aspects of our large-scale dataset.

Data Collection

Given that endovascular intervention constitutes a medical procedure, acquiring extensive human data is often impractical and time-consuming due to privacy constraints. To address this challenge, we suggest an alternative approach involving the collection of data from two distinct sources:

  1. Utilizing vascular soft silicone phantoms modeled after the human body.
  2. Employing animal subjects, specifically pigs. The selection of pigs is justified by their vascular anatomy, which is widely acknowledged as highly analogous to that of humans.

Ethics
Since our data collection involves experiments with radiation sources (X-ray radiology fluoroscopic systems) and live animals, all relevant ethical approvals were obtained in advance of the collection process. The human subjects who collect the data are well-trained and professional endovascular surgeons, wearing protective suits as part of daily practice in the hospital.

(a) Silicon phantom(b) Data collection setup
Figure-1aFigure-1b

Figure 1:: The human silicon phantom model (a), and data collection setup in the operating room (b).

Phantom Setup
To ensure that data is collected from various models, we use five adult human aortic arch phantoms made of soft silicone, manufactured by Elastrat Ltd., Switzerland. To enhance realism in the interaction between surgical tools and tissues, the phantoms are connected to a pulsatile pump to simulate the flow of normal human blood. All phantoms are placed beneath an X-ray imaging system to mimic a patient lying on an angiography table, preparing for an endovascular procedure.

Animal Setup
We use five live pigs as subjects for data collection. The animal setup is identical to that of a human procedure. During the endovascular intervention, professional surgeons use an iodine-based contrast agent to enhance visibility of specific structures or fluids within the body. Iodine contrast agents are radiopaque, meaning they absorb X-rays, resulting in improved visibility of blood vessels, organs, and other structures such as the catheter and guidewire during imaging.

Figure 2

Figure 2: Example data collected with phantom models (top row) and animals (bottom row). Animal data are more challenging with less visible catheters or guidewires.

Data Collection
Ten skilled professional surgeons are tasked with cannulating three arteries, namely the left subclavian (LSA), left common carotid (LCCA), and right common carotid (RCCA), using a commercial catheter and guidewire. Throughout each catheterization process, the surgeon operator activates the X-ray fluoroscopy using a pedal in the operating room. We developed a real-time image grabber to transmit the video feed of the surgical scene to a workstation. The experiments are conducted under two interventional radiology fluoroscopic systems: Innova 4100 IQ GE Healthcare and EMD Technologies Epsilon X-ray Generator. Fig 1 shows the data collection setup with human silicon phantoms and Fig 2 visualizes the collected data with phantom models and real animals. From 3, we can see that there is a huge domain gap between data collected using phantom models and live animals.

Data Annotation

Actions
Based on advice from expert endovascular surgeons, we define five classes to annotate catheterization actions. These classes fall into three groups: catheter (\texttt{advance catheter} and \texttt{retract catheter}), guidewire (\texttt{advance guidewire} and \texttt{retract guidewire}), and one action involving both the catheter and guidewire (\texttt{rotate}). Surgeons typically rotate both the catheter and guidewire simultaneously, so we use one rotation class. We utilize a free, open-source video editor to annotate the start and end times of each narrated action. All fluoroscopy videos are processed at a 500 x 500 resolution and 24 frames per second (FPS). To ensure annotation quality, all ground-truth actions are manually checked and modified by an experienced endovascular surgeon.

Collision Annotation
In practice, the collision between the catheter (or guidewire) and the blood vessel wall mainly occurs at the instrument's tip. Therefore, for each frame of the fluoroscopy video, we annotate the catheter (or guidewire) tip with a bounding box. There are two classes for the bounding boxes: \texttt{collision} (when the instrument collides with the blood vessel) and \texttt{normal} (when there is no collision). We used an open-source labeling tool to annotate bounding boxes in each video, with all videos encoded at 24 FPS to ensure dataset coherence.

Segmentation
The combination of guidewire and catheter is common in endovascular interventions, where precise navigation through blood vessels is essential for procedure success. Unlike most previous datasets that consider both catheter and guidewire as one class, we manually label catheter and guidewire classes separately in our dataset. Our segmentation ground truth thus provides a more detailed understanding of endovascular interventions.

Dataset Statistics

Overview
As summarized in Table 1 in the previous part 1, CathAction is a large-scale benchmark for endovascular interventions. Our dataset consists of approximately 500,000 annotated frames for action understanding and collision detection, and around 25,000 ground-truth masks for catheter and guidewire segmentation. There are a total of 569 videos in our dataset. Some collected video samples are illustrated in Fig 2. We believe CathAction is currently the largest, most challenging, and most comprehensive dataset of endovascular interventions.

Statistics
The CathAction dataset is annotated with a primary focus on catheters and guidewires. Fig. 3 provides an overview of the distribution of action classes in both animal and phantom data, while Fig. 4 portrays the distribution of action segment lengths, illustrating the substantial variability in segment duration. Additionally, Fig. 5 visually compares the number of bounding boxes between phantom and animal data, revealing a significant disparity between counts of normal and collision boxes, as expected due to the infrequency of collisions in real-world scenarios.

Figure 3

Figure 3: Distribution of the number of action classes in the CathAction dataset. Left-side: Distribution on real animal data. Right-side: Distribution on phantom data.

Figure 4

Figure 4: Duration distribution of segments' actions in the CathAction dataset, on real animal data and phantom data.

Figure 5

Figure 5: Comparison of the number of bounding box objects in real animal data and phantom data.

Adaptation Property
Since data is collected from two sources—phantoms and real animals—a domain gap exists between the two data types. Fig. 2 and Fig. 5 also demonstrate the adaptation property shared between phantom and animal data. This distinctive domain gap renders CathAction a formidable benchmark for evaluating domain adaptation, a critical problem in medical domains where collecting real human data is often infeasible. Using CathAction, we can develop domain adaptation techniques, learning from synthetic or phantom data and effectively applying that knowledge to genuine animal or human data, bridging the gap between controlled simulation and real-world scenarios.

Next

In the next post, we will benchmark our new dataset CathAction on various tasks.

CathAction - A Benchmark for Endovascular Intervention Understanding (Part 1)

Real-time visual feedback from catheterization analysis is crucial for enhancing surgical safety and efficiency during endovascular interventions. However, existing datasets are often limited to specific tasks, small scale, and lack the comprehensive annotations necessary for broader endovascular intervention understanding. To tackle these limitations, we introduce CathAction, a large-scale dataset for catheterization understanding. Our CathAction dataset encompasses approximately 500,000 annotated frames for catheterization action understanding and collision detection, and 25,000 ground truth masks for catheter and guidewire segmentation. For each task, we benchmark recent related works in the field. We further discuss the challenges of endovascular intentions compared to traditional computer vision tasks and point out open research questions. We hope that CathAction will facilitate the development of endovascular intervention understanding methods that can be applied to real-world applications. Intro

1. Introduction

DatasetCollectionType#FramesSourceAnnotationPublicTask
Barbu et al.X-rayVideo535RealManualNoSegmentation
Wu et al.3D EchoVideo800RealManualNoSegmentation
Ambrosini et al.X-rayImage948RealManualNoSegmentation
Mastmeyer et al.3D MRIImage101RealManualNoSegmentation
Yi et al.X-rayImage2,540SynthesisAutomaticNoSegmentation
Nguyen et al.X-rayImage25,271PhantomSemi-AutoNoSegmentation
Danilov et al.3D UltrasoundVideo225SyntheticManualNoSegmentation
Delmas et al.X-rayImage2,357SimulatedAutomaticNoReconstruction
Brost et al.X-rayImage938ClinicalSemi-AutoNoTracking
Ma et al.X-ray, CTImage1,048ClinicalManualNoReconstruction
CathAction (ours)X-rayVideo500,000+Phantom & AnimalManualYesSegmentation, Action Understanding, Collision Detection

Table 1: Endovascular intervention datasets comparison.

Cardiovascular diseases are one of the leading causes of death worldwide. Endovascular intervention has become the gold standard treatment for these diseases, preferred for its advantages over traditional open surgery, including smaller incisions, reduced trauma, and lower risks of comorbidities for patients. Endovascular interventions involve maneuvering small and long medical instruments, such as catheters and guidewires, within the vasculature through small incisions to reach targeted areas for treatment delivery, such as artery stenting, tissue ablation, and drug delivery. However, such tasks require high technical skill, with the primary challenge being to avoid collisions with the vessel wall, which could result in severe consequences, including perforation, hemorrhage, and organ failure. In practice, surgeons rely on 2D X-ray fluoroscopy images to perform these tasks within the 3D human body, which adds a significant challenge in safely controlling the catheter and guidewire.

Recently, learning-based methods for computer-assisted intervention systems have emerged for diverse tasks. Numerous methodologies have been developed to address the challenges of endovascular interventions, including catheter and guidewire segmentation, vision-based force sensing, learning from demonstration, and skill training assistance. Additionally, various deep learning approaches have been proposed for specific tasks in endovascular interventions, such as instrument motion recognition in X-ray sequences, interventionalist hand motion recognition, and collision detection. However, due to challenges in acquiring medical data, most of these methods rely on synthetic data or small, private datasets. Consequently, despite the critical nature of interventions, current methods have not fully capitalized on recent advancements in deep learning, which typically require large-scale training data.

Over the years, several datasets for endovascular intervention have been introduced. Table 1 shows a detailed comparison between current endovascular intervention datasets. However, these datasets share common limitations. First, they are relatively small in terms of the number of images, as collecting real-world medical data is costly. Second, due to privacy challenges in the medical domain, most existing datasets are kept private. Finally, these datasets are often created for a single task, such as segmentation, and do not support other important tasks in endovascular interventions, such as collision detection or action understanding.

Intro

To address these issues, we present CathAction, a large-scale dataset encompassing several endovascular intervention tasks, including segmentation, collision detection, and action understanding. To our knowledge, CathAction represents the largest and most realistic dataset specifically tailored for catheter and guidewire tasks.

In summary, we make the following contributions:

  • We introduce CathAction, a large-scale dataset for endovascular interventions, providing manually labeled ground truth for segmentation, action understanding, and collision detection.
  • We benchmark key tasks in endovascular interventions, including catheterization anticipation, recognition, segmentation, and collision detection.
  • We discuss the challenges and open questions in endovascular intervention. Our code and dataset are publicly available.

2. Related Work

Endovascular Intervention Dataset
Several endovascular intervention datasets have been introduced. Barbu et al. proposed a dataset that effectively localizes the entire guidewire and validated it using a traditional threshold-based method. Other datasets consider fluoroscopy videos at the image level, with mask annotations for each frame from the fluoroscopy videos. For instance, Ambrosini et al. developed a dataset with 948 annotated mask segmentation instances considering both catheter and guidewire as one class. Similarly, Mastmeyer et al. collected and annotated a dataset with 101 segmentation masks for the real catheter from 3D MRI data. More recently, Nguyen et al. proposed a dataset that considers both catheter and guidewire as one class. Overall, most of these datasets have limitations in terms of size, task categories, and focus. To overcome these limitations, we introduce CathAction, a large-scale dataset with various tasks, including catheter and guidewire segmentation, collision detection, and catheter action recognition and anticipation. The CathAction dataset enables the development of more accurate and reliable deep learning methods for endovascular interventions.

Catheterization Action Understanding
Deep learning techniques have demonstrated notable achievements in endovascular intervention action understanding. Jochem et al. presented one of the first works utilizing deep learning for catheter and guidewire activity recognition in fluoroscopy sequences. Subsequently, deep learning-based approaches have gained prominence as the most widely utilized solution for interventionalist hand motion recognition. For instance, Akinyemi et al. introduced a deep learning model based on convolutional neural networks (CNNs) that incorporates convolutional layers for automatic feature extraction and identifies operators' actions. Additionally, Wang et al. proposed a multimodal fusion architecture for recognizing eight common operating behaviors of interventionists. Despite extensive research on deep learning methods for endovascular intervention, it comes with the limitation of medical data: most of these methods use synthetic data or small, private datasets. This leads to the fact that although intervention is a crucial procedure, it has not fully benefited from recent deep learning advancements, where large-scale training data are usually required.

Catheter and Guidewire Segmentation
Catheter and guidewire segmentation is crucial for real-time endovascular interventions. Many methods have been proposed to address the challenges of catheter and guidewire segmentation. The outcomes of catheter and guidewire segmentation can be applied in vision-based force sensing, learning from demonstration, or skill training assistance applications. Traditional methods for catheterization segmentation adopt thresholding-based techniques and do not generalize well on X-ray data. Deep learning methods can learn meaningful features from input data, but they are challenging to apply to catheter segmentation due to the lack of real X-ray data and the tediousness of manual ground truth labeling. Many current learning-based techniques for catheter segmentation and tracking are limited to training on small-scale datasets or synthetic data due to the challenges of large-scale data collection. Our dataset provides manual ground truth labels for both the catheter and guidewire, offering substantial development for catheter and guidewire segmentation.

Collision Detection
Collision detection is a crucial task in endovascular interventions to ensure patient safety. Several attempts have been made to incorporate deep learning models into collision detection, but these methods have focused on identifying risky actions in simulated datasets. While existing methods can be useful for identifying potential hazards, they cannot localize the position of collisions or provide visual feedback. Additionally, these methods have not been widely used in real-world settings due to the lack of annotated bounding boxes for collisions of guidewire tips with vessel walls. Our dataset addresses this limitation by providing annotated bounding boxes for collision events in both phantom and real-world data. This enables the development of deep learning models that can detect collisions in real-time and provide visual or haptic feedback to surgeons.

Next

In the next post, we will describe our new dataset CathAction.