2024
DOI: 10.1109/tmi.2024.3363190
|View full text |Cite
|
Sign up to set email alerts
|

ScribFormer: Transformer Makes CNN Work Better for Scribble-Based Medical Image Segmentation

Zihan Li,
Yuan Zheng,
Dandan Shan
et al.

Abstract: Most recent scribble-supervised segmentation methods commonly adopt a CNN framework with an encoder-decoder architecture. Despite its multiple benefits, this framework generally can only capture small-range feature dependency for the convolutional layer with the local receptive field, which makes it difficult to learn global shape information from the limited information provided by scribble annotations. To address this issue, this paper proposes a new CNN-Transformer hybrid solution for scribble-supervised me… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 10 publications
(2 citation statements)
references
References 66 publications
0
2
0
Order By: Relevance
“…The chosen methods are all considered classic and representative within the medical image segmentation field. They encompass convolutional baselines such as U-Net [14], U-Net++ [15], U-Net3+ [17] and ResU-Net [16]; recent transformer-based baselines such as TransU-Net [23] and MedT [24]; the CNN-Transformer hybrid model ScribFormer [29]; and the CNN-MLP hybrid model UNeXt [4].…”
Section: Comparison Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…The chosen methods are all considered classic and representative within the medical image segmentation field. They encompass convolutional baselines such as U-Net [14], U-Net++ [15], U-Net3+ [17] and ResU-Net [16]; recent transformer-based baselines such as TransU-Net [23] and MedT [24]; the CNN-Transformer hybrid model ScribFormer [29]; and the CNN-MLP hybrid model UNeXt [4].…”
Section: Comparison Methodsmentioning
confidence: 99%
“…LET-Net [27] combines a U-shaped CNN with a transformer effectively in a capsule embedding style to compensate for respective deficiencies; Yuan et al [28] proposed CTC-Net, which utilizes dual coding paths of CNNs and transformer encoders to produce complementary features. ScribFormer [29] improves model performance by utilizing a three-branch structure to unify shallow and deep features, which consists of a mixture of CNN branches, transformer branches, and attention-guided class activation map (ACAM) branches. CNN-MLP hybrid models' strength is that it can achieve better segmentation results while having a small number of parameters and fast reasoning.…”
Section: Hybrid Modelsmentioning
confidence: 99%