2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017
DOI: 10.1109/cvpr.2017.715
|View full text |Cite
|
Sign up to set email alerts
|

Look into Person: Self-Supervised Structure-Sensitive Learning and a New Benchmark for Human Parsing

Abstract: Human parsing has recently attracted a lot of research interests due to its huge application potentials. However existing datasets have limited number of images and annotations, and lack the variety of human appearances and the coverage of challenging cases in unconstrained environment. In this paper, we introduce a new benchmark 1 "Look into Person (LIP)" that makes a significant advance in terms of scalability, diversity and difficulty, a contribution that we feel is crucial for future developments in humanc… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
338
0
7

Year Published

2018
2018
2019
2019

Publication Types

Select...
5
1

Relationship

2
4

Authors

Journals

citations
Cited by 480 publications
(349 citation statements)
references
References 36 publications
0
338
0
7
Order By: Relevance
“…2) We analyze three important sources of information, leading to a novel network architecture that conditionally incorporates direct, top-down, and bottom-up inferences. 3) Our model achieves state-of-the-art performances for comprehensive evaluations on four public datasets (LIP [22], PASCAL-Person-Part [71], ATR [39] and Fashion Clothing [49]). Testing with more than 20K images demonstrates the superiority over existing methods of exploiting compositional structural information for human parsing.…”
Section: Introductionmentioning
confidence: 94%
See 4 more Smart Citations
“…2) We analyze three important sources of information, leading to a novel network architecture that conditionally incorporates direct, top-down, and bottom-up inferences. 3) Our model achieves state-of-the-art performances for comprehensive evaluations on four public datasets (LIP [22], PASCAL-Person-Part [71], ATR [39] and Fashion Clothing [49]). Testing with more than 20K images demonstrates the superiority over existing methods of exploiting compositional structural information for human parsing.…”
Section: Introductionmentioning
confidence: 94%
“…The aforementioned deep human parsers generally achieve promising results, due to the strong learning power of neural networks [46,4] and the plentiful availability of annotated data [22,71]. However, they typically need to pre-segment images into superpixels [40,41], which breaks the end-to-end story and is time-consuming, or rely on extra human landmarks [72,22,71,14,54], requiring additional annotations or pre-trained pose estimators. Though [81] also performs multi-level, fine-grained parsing, it neither explores different information flows within human hierarchies nor models the problem from the view of multi-source information fusion.…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations