2019
DOI: 10.1109/tpami.2018.2820063
|View full text |Cite
|
Sign up to set email alerts
|

Look into Person: Joint Body Parsing & Pose Estimation Network and a New Benchmark

Abstract: Human parsing and pose estimation have recently received considerable interest due to their substantial application potentials. However, the existing datasets have limited numbers of images and annotations and lack a variety of human appearances and coverage of challenging cases in unconstrained environments. In this paper, we introduce a new benchmark named "Look into Person (LIP)" that provides a significant advancement in terms of scalability, diversity, and difficulty, which are crucial for future developm… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
230
0
7

Year Published

2019
2019
2021
2021

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 349 publications
(237 citation statements)
references
References 49 publications
(150 reference statements)
0
230
0
7
Order By: Relevance
“…We use the state-of-the-art human parsing model CE2P [23] to predict the human part label maps for all the images in the three benchmark in advance. The CE2P model is trained on the Look Into Person [18] (LIP) dataset, which consists of ∼30, 000 finely annotated images with 20 semantic labels (19 human parts and 1 background). We divide the 20 semantic categories into K groups 2 , and train the CE2P model with the grouped labels.…”
Section: Implementation Detailsmentioning
confidence: 99%
See 1 more Smart Citation
“…We use the state-of-the-art human parsing model CE2P [23] to predict the human part label maps for all the images in the three benchmark in advance. The CE2P model is trained on the Look Into Person [18] (LIP) dataset, which consists of ∼30, 000 finely annotated images with 20 semantic labels (19 human parts and 1 background). We divide the 20 semantic categories into K groups 2 , and train the CE2P model with the grouped labels.…”
Section: Implementation Detailsmentioning
confidence: 99%
“…E.g. the label set in[18]: background, hat, hair, glove, sunglasses, upper-clothes, dress, coat, socks, pants, jumpsuits, scarf, skirt, face, rightarm, left-arm, right-leg, left-leg, right-shoe and left-shoe.…”
mentioning
confidence: 99%
“…To deal with the BG shift problem, one possible solution is to completely remove BGs using the binary body mask obtained by semantic segmentation or human parsing methods. Currently, methods such as Mask-RCNN [13] and JPP-Net [25] can obtain body masks with the pre-trained model on large-scale datasets, e.g., MS COCO [26] and LIP [25]. However, masks obtained by these methods often contained errors due to reasons such as low-resolution person images and highly dynamic person poses.…”
Section: Related Workmentioning
confidence: 99%
“…L 2 distance is applied to minimize the loss. The JPPNet [25] is employed to extract M (I Ds ). We find that masks obtained by JPPNet often contain segmentation errors.…”
Section: Objective Functions In Sbsganmentioning
confidence: 99%
See 1 more Smart Citation