2023
DOI: 10.26599/air.2023.9150015
|View full text |Cite
|
Sign up to set email alerts
|

Polyp-PVT: Polyp Segmentation with Pyramid Vision Transformers

Abstract: Most polyp segmentation methods use convolutional neural networks (CNNs) as their backbone, leading to two key issues when exchanging information between the encoder and decoder: (1) taking into account the differences in contribution between differentlevel features, and (2) designing an effective mechanism for fusing these features. Unlike existing CNN-based methods, we adopt a transformer encoder, which learns more powerful and robust representations. In addition, considering the image acquisition influence … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
23
1

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 99 publications
(24 citation statements)
references
References 78 publications
0
23
1
Order By: Relevance
“…Due to only a single scale of output feature map with low resolution, it was challenging to directly adapt it to polyp segmentation task. Based this consideration, several methods [ 33 – 37 ] incorporated the pyramid structure in CNN into the design of Transformers, presenting a hierarchical Transformer with different stages. Dong et al [ 33 ] utilized a pyramid vision Transformer (PVT) as backbone encoder and presented a polyp segmentation architecture called Polyp-PVT.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Due to only a single scale of output feature map with low resolution, it was challenging to directly adapt it to polyp segmentation task. Based this consideration, several methods [ 33 – 37 ] incorporated the pyramid structure in CNN into the design of Transformers, presenting a hierarchical Transformer with different stages. Dong et al [ 33 ] utilized a pyramid vision Transformer (PVT) as backbone encoder and presented a polyp segmentation architecture called Polyp-PVT.…”
Section: Related Workmentioning
confidence: 99%
“…Based this consideration, several methods [ 33 – 37 ] incorporated the pyramid structure in CNN into the design of Transformers, presenting a hierarchical Transformer with different stages. Dong et al [ 33 ] utilized a pyramid vision Transformer (PVT) as backbone encoder and presented a polyp segmentation architecture called Polyp-PVT. Tang et al [ 34 ] proposed a Dual-Aggregation Transformer Network (DuAT) to segment polyp regions, which adapted the PVT as the encoder for capturing richer feature cues.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Although the convolution operation of CNN can capture image details well, it has some limitations in accessing global and long‐range semantic information 39 . Many researchers 20,40–43 combined Transformer and CNN to utilize complementary advantages between each other to obtain effective segmentation performance. TransUNet 20 was the first work that applied Transformer layers to the U‐Net structure for medical image segmentation.…”
Section: Related Workmentioning
confidence: 99%
“…The segmentation performance, parameter number, and inference speed have the improvements to some extent. Poly‐PVT 41 designed a pyramid vision transformer (PVT) encoder to extract multi‐scale long‐range dependencies features from input images. Three simple modules based on CNN, that is, cascaded fusion module, camouflage identification module, and similarity aggregation module were introduced to achieve superior segmentation results.…”
Section: Related Workmentioning
confidence: 99%