2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017
DOI: 10.1109/cvpr.2017.185
|View full text |Cite
|
Sign up to set email alerts
|

Webly Supervised Semantic Segmentation

Abstract: We propose a weakly supervised semantic segmentation algorithm that uses image tags for supervision. We apply the tags in queries to collect three sets of web images, which encode the clean foregrounds, the common backgrounds, and realistic scenes of the classes. We introduce a novel three-stage training pipeline to progressively learn semantic segmentation models. We first train and refine a class-specific shallow neural network to obtain segmentation masks for each class. The shallow neural networks of all c… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
44
0
2

Year Published

2017
2017
2021
2021

Publication Types

Select...
5
4
1

Relationship

0
10

Authors

Journals

citations
Cited by 66 publications
(46 citation statements)
references
References 36 publications
(119 reference statements)
0
44
0
2
Order By: Relevance
“…Since the object-level annotations are more accurate and comprehensive than image-level annotations, these methods usually have better performance. For example, the methods trained with accurate bounding boxes have performance more than 60% IOU, while most of the methods with image-level [45] 37.8 37.0 EM-Adapt (ICCV 2015) [17] 38.2 39.6 DCSM (ECCV 2016) [48] 44.1 45.1 BFBP (ECCV 2016) [44] 46.6 48.0 STC (PAMI 2017) [49] 49.8 51.2 SEC (ECCV 2016) [22] 50.7 51.7 AF-SS (ECCV 2016) [19] 52.6 52.7 WebSeg (CVPR 2017) [50] 53.4 55.3 AE-PSL (CVPR 2017) [14] 55.0 55.7 WebCoSeg (BMVC 2017) [51] 56.4 56.9 DSNA (Arxiv 2018) [24] 58.2 60.1 MDC (CVPR 2018) [23] 60.4 60.8 MCOF (CVPR 2018) [52] 60.3 61.2 AffinityNet (CVPR 2018) [53] 61.7 63.7 Boostrap(CVPR 2018) [54] 63.0 63.9 InstancesSalient(ECCV 2018) [55] 64.5 65.6 Ours 61.9 62.8 labels have the performance less than 60%. Some image-level based methods implicitly use pixel-level supervision in their models such as [19], [21], therefore their models can achieve relatively higher performance than those only using imagelevel labels.…”
Section: Semantic Segmentation Resultsmentioning
confidence: 99%
“…Since the object-level annotations are more accurate and comprehensive than image-level annotations, these methods usually have better performance. For example, the methods trained with accurate bounding boxes have performance more than 60% IOU, while most of the methods with image-level [45] 37.8 37.0 EM-Adapt (ICCV 2015) [17] 38.2 39.6 DCSM (ECCV 2016) [48] 44.1 45.1 BFBP (ECCV 2016) [44] 46.6 48.0 STC (PAMI 2017) [49] 49.8 51.2 SEC (ECCV 2016) [22] 50.7 51.7 AF-SS (ECCV 2016) [19] 52.6 52.7 WebSeg (CVPR 2017) [50] 53.4 55.3 AE-PSL (CVPR 2017) [14] 55.0 55.7 WebCoSeg (BMVC 2017) [51] 56.4 56.9 DSNA (Arxiv 2018) [24] 58.2 60.1 MDC (CVPR 2018) [23] 60.4 60.8 MCOF (CVPR 2018) [52] 60.3 61.2 AffinityNet (CVPR 2018) [53] 61.7 63.7 Boostrap(CVPR 2018) [54] 63.0 63.9 InstancesSalient(ECCV 2018) [55] 64.5 65.6 Ours 61.9 62.8 labels have the performance less than 60%. Some image-level based methods implicitly use pixel-level supervision in their models such as [19], [21], therefore their models can achieve relatively higher performance than those only using imagelevel labels.…”
Section: Semantic Segmentation Resultsmentioning
confidence: 99%
“…Weakly Supervised Signals for Segmentation: Numerous alternatives to expensive pixel-level segmentation have been proposed and used in the literature. Image-level labels [27], noisy web labels [1,16] and scribble-level labels [20] are some of the supervisory signal that have been used to guide segmentation methods. Closer to our approach, [3] employs point-level supervision in the form of a single click to train a CNN for semantic segmentation and [26] uses central points of an imaginary bounding box to weakly supervise object detection.…”
Section: Related Workmentioning
confidence: 99%
“…To improve the localization performance, some approaches [58][59][60][61] have proposed to exploit the notion of objectness, either by incorporating it in the loss function [58,59], or by employing pre-trained network as external objectness module [60,61]. Another promising way to improve the segmentation performance is to utilize additional weakly supervised images, such as web images, to train CNNs, such as [62,63].…”
Section: Weakly Supervised Semantic Segmentationmentioning
confidence: 99%