Polyp-PVT: Polyp Segmentation with Pyramid Vision Transformers

Dong, Bo; Wang, Wenhai; Fan, Deng-Ping; Li, Jinpeng; Fu, Huazhu; Shao, Ling

doi:10.48550/arxiv.2108.06932

Cited by 39 publications

(93 citation statements)

References 68 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For example, Vision Transformer (ViT) [1] first showed that a pure transformer can archive stateof-the-art performance in image classification. The Pyramid Vision Transformer (PVT v1) [3] showed that a pure transformer backbone can also surpass CNN counterparts for dense prediction tasks such as detection and segmentation [9][10][11]. Later, Swin transformer [5], CoaT [6], LeViT [7], and Twins [8] further improved classification, detection, and segmentation performance with transformer backbones.…”

mentioning

confidence: 99%

PVT v2: Improved baselines with Pyramid Vision Transformer

et al. 2022

Self Cite

View full text Add to dashboard Cite

Transformers have recently lead to encouraging progress in computer vision. In this work, we present new baselines by improving the original Pyramid Vision Transformer (PVT v1) by adding three designs: (i) a linear complexity attention layer, (ii) an overlapping patch embedding, and (iii) a convolutional feed-forward network. With these modifications, PVT v2 reduces the computational complexity of PVT v1 to linearity and provides significant improvements on fundamental vision tasks such as classification, detection, and segmentation. In particular, PVT v2 achieves comparable or better performance than recent work such as the Swin transformer. We hope this work will facilitate state-of-the-art transformer research in computer vision. Code is available at https://github.com/whai362/PVT.

show abstract

mentioning

confidence: 99%

PVT v2: Improved baselines with Pyramid Vision Transformer

et al. 2022

Self Cite

View full text Add to dashboard Cite

show abstract

“…Nanni et al [ 36 ] proposed encoder–decoder ensemble classifiers that can be used for semantic segmentation and introduced a novel loss function that results from the combination of Dice loss and a structural similarity index (SSIM). Dong et al [ 37 ] presented a pyramid vision transformer backbone as an encoder for the extraction of robust features that has three tight components: a cascaded fusion module (CFM), camouflage identification module (CIM), and similarity aggregation module (SAM). The sum of the IoU and weighted binary cross-entropy loss is used as the loss function.…”

Section: Related Workmentioning

confidence: 99%

Automatic Detection and Segmentation of Thrombi in Abdominal Aortic Aneurysms Using a Mask Region-Based Convolutional Neural Network with Optimized Loss Functions

Hwang

Kim

Lee

et al. 2022

Sensors

View full text Add to dashboard Cite

The detection and segmentation of thrombi are essential for monitoring the disease progression of abdominal aortic aneurysms (AAAs) and for patient care and management. As they have inherent capabilities to learn complex features, deep convolutional neural networks (CNNs) have been recently introduced to improve thrombus detection and segmentation. However, investigations into the use of CNN methods is in the early stages and most of the existing methods are heavily concerned with the segmentation of thrombi, which only works after they have been detected. In this work, we propose a fully automated method for the whole process of the detection and segmentation of thrombi, which is based on a well-established mask region-based convolutional neural network (Mask R-CNN) framework that we improve with optimized loss functions. The combined use of complete intersection over union (CIoU) and smooth L1 loss was designed for accurate thrombus detection and then thrombus segmentation was improved with a modified focal loss. We evaluated our method against 60 clinically approved patient studies (i.e., computed tomography angiography (CTA) image volume data) by conducting 4-fold cross-validation. The results of comparisons to multiple other state-of-the-art methods suggested the superior performance of our method, which achieved the highest F1 score for thrombus detection (0.9197) and outperformed most metrics for thrombus segmentation.

show abstract

“…• In [1] and [60] several deep learning segmentation approaches are compared, SegNet, U-Net, DeepLabv3+, HarD-NetMSEG (Harmonic Densely Connected Network) 1 [61] and Polyp-PVT [62] a deep learning segmentation model based on a transformer encoder, i.e. PVT (Pyramid Vision Transformer) 2 .…”

Section: Skin Detection Approachesmentioning

confidence: 99%

“…• number of epoch=10 (using the simple data augmentation approach DA1, see section 3.3) or 15 (the latter more complex data augmentation approach DA2, see section 3. We present an ensemble based on DeepLabV3+, HarDNet-MSEG [61], Polyp-PVT [62], and Hybrid Semantic Network (HSN) [79]. HarD-Net-MSEG (Harmonic Densely Connected Network) [61] is a model influenced by densely connected networks, that can reduce memory consumption by diminishing aggregation with the reduction of most connection layers to the DenseNet layer.…”

Section: Deep Learning For Semantic Image Segmentationmentioning

confidence: 99%

A Standardized Approach for Skin Detection: Analysis of the Literature and Case-Studies

Nanni¹,

Loreggia²,

Lumini³

et al. 2022

Preprint

View full text Add to dashboard Cite

Skin detection, the process of distinguishing between skin and non-skin regions in a digital image, is widely used in a variety of applications ranging from hand gesture analysis to body part tracking to facial recognition. Skin detection is a challenging problem that has received a lot of attention from experts and proposals from the research community in the context of intelligent systems, but the lack of common benchmarks and unified testing protocols has hampered fairness among approaches. Comparisons are very difficult. Recently, the success of deep neural networks has had a major impact on the field of image segmentation detection, resulting in various successful models to date. In this work, we survey the most recent research in this field and propose fair comparisons between approaches using several different datasets. The main contributions of this work are: (i) a comprehensive literature review of approaches to skin color detection and a comparison of approaches that may help researchers and practitioners choose the best method for their application; (ii) a comprehensive list of datasets that report ground truth for skin detection; (iii) a framework for evaluating and combining different skin detection approaches. Moreover, we proposed an ensemble of convolutional neural networks and transformers that obtains state of the art performance. All the code is made publicly available at https://github.com/LorisNanni

show abstract

Polyp-PVT: Polyp Segmentation with Pyramid Vision Transformers

Cited by 39 publications

References 68 publications

PVT v2: Improved baselines with Pyramid Vision Transformer

PVT v2: Improved baselines with Pyramid Vision Transformer

Automatic Detection and Segmentation of Thrombi in Abdominal Aortic Aneurysms Using a Mask Region-Based Convolutional Neural Network with Optimized Loss Functions

A Standardized Approach for Skin Detection: Analysis of the Literature and Case-Studies

Contact Info

Product

Resources

About