2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021
DOI: 10.1109/cvpr46437.2021.00167
|View full text |Cite
|
Sign up to set email alerts
|

Differentiable Multi-Granularity Human Representation Learning for Instance-Aware Human Semantic Parsing

Abstract: To address the challenging task of instance-aware human part parsing, a new bottom-up regime is proposed to learn category-level human semantic segmentation as well as multi-person pose estimation in a joint and end-to-end manner. It is a compact, efficient and powerful framework that exploits structural information over different human granularities and eases the difficulty of person partitioning. Specifically, a dense-to-sparse projection field, which allows explicitly associating dense human semantics with … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
23
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
5
4
1

Relationship

1
9

Authors

Journals

citations
Cited by 75 publications
(24 citation statements)
references
References 58 publications
1
23
0
Order By: Relevance
“…In the deep learning era, human parsing, as a sub-field of scene parsing, became active. Some recent human parsers explored human part relations, based on the human hierarchy [36,56,80,83,110,111]. Only very few efforts [43,45,89,104] are concerned with utilizing structured knowledge to aid the training of general-purpose semantic segmentation networks.…”
Section: Related Workmentioning
confidence: 99%
“…In the deep learning era, human parsing, as a sub-field of scene parsing, became active. Some recent human parsers explored human part relations, based on the human hierarchy [36,56,80,83,110,111]. Only very few efforts [43,45,89,104] are concerned with utilizing structured knowledge to aid the training of general-purpose semantic segmentation networks.…”
Section: Related Workmentioning
confidence: 99%
“…Fully convolutional network (FCN) [38] is selected as the basic structure of our network since the grasp label is pixellevel. FCN is widely used for semantic segmentation of images [39][40][41][42]. Our previous work [25] prove that FCN is effective for predicting dense grasp poses.…”
Section: A Network Architecturementioning
confidence: 88%
“…Recent human segmentation methods focus on human instance part segmentation. There are two paradigms for this direction: top-down pipelines [28,60,45,24,59] and bottom-up pipelines [19,27,67,71]. Our framework solves the instance level segmentation with fine-grained attributes recognition, which is much more challenging.…”
Section: Related Workmentioning
confidence: 99%