2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021
DOI: 10.1109/cvpr46437.2021.00412
|View full text |Cite
|
Sign up to set email alerts
|

VSPW: A Large-scale Dataset for Video Scene Parsing in the Wild

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

1
80
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 83 publications
(81 citation statements)
references
References 47 publications
1
80
0
Order By: Relevance
“…Following experiments are conducted on VSPW2021 test dataset and mIoU score is reported. The baseline is the original Spatial-Temporal PPM with ResNet-101 backbone [8] We apply Swin Transformer-Large as backbone for Spatial-Temporal PPM, which achieves better mIoU score of 43.63%, with a significant improvement of 6.17%. Then we demonstrate the effectiveness of our proposed Bilateral Network with Vision Transformer, which can bring an improvement of 2.10%.…”
Section: Ablation Studymentioning
confidence: 99%
See 3 more Smart Citations
“…Following experiments are conducted on VSPW2021 test dataset and mIoU score is reported. The baseline is the original Spatial-Temporal PPM with ResNet-101 backbone [8] We apply Swin Transformer-Large as backbone for Spatial-Temporal PPM, which achieves better mIoU score of 43.63%, with a significant improvement of 6.17%. Then we demonstrate the effectiveness of our proposed Bilateral Network with Vision Transformer, which can bring an improvement of 2.10%.…”
Section: Ablation Studymentioning
confidence: 99%
“…The video scene parsing is primarily a task of video semantic segmentation, but it is more difficult and challenging due to the wide range of real-world scenarios and dense annotation for every pixel, that's to say we must segment the specific categories for backgrounds(e.g., road, wall, sky). VSPW [8] is a suitable benchmark to tackle the challenging video scene parsing task.…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations
“…To address these issues, we introduce a new benchmark that more closely aligns with the target task. Our benchmark uses the PASCAL-VOC dataset [19] as the training set and we re-purpose a densely labelled video semantic segmentation dataset, VSPW [20], as the test set. We refer to this cross-domain dataset as PASCAL-to-MiniVSPW.…”
Section: Introductionmentioning
confidence: 99%