2018
DOI: 10.1007/978-3-030-01228-1_31
|View full text |Cite
|
Sign up to set email alerts
|

Mutual Learning to Adapt for Joint Human Parsing and Pose Estimation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
81
0

Year Published

2019
2019
2020
2020

Publication Types

Select...
5
2

Relationship

1
6

Authors

Journals

citations
Cited by 140 publications
(81 citation statements)
references
References 24 publications
0
81
0
Order By: Relevance
“…LIP [22]: We compare our method with 11 state-of-thearts on LIP val set in Table 1. Our method achieves a huge boost in average IoU (4.64% better than the second best method, CE2P [56] and 8.4% better than the third best, MuLA [54]). To verify its effectiveness in detail, we report per-class IoU in Table 2.…”
Section: Quantitative Resultsmentioning
confidence: 99%
See 3 more Smart Citations
“…LIP [22]: We compare our method with 11 state-of-thearts on LIP val set in Table 1. Our method achieves a huge boost in average IoU (4.64% better than the second best method, CE2P [56] and 8.4% better than the third best, MuLA [54]). To verify its effectiveness in detail, we report per-class IoU in Table 2.…”
Section: Quantitative Resultsmentioning
confidence: 99%
“…The aforementioned deep human parsers generally achieve promising results, due to the strong learning power of neural networks [46,4] and the plentiful availability of annotated data [22,71]. However, they typically need to pre-segment images into superpixels [40,41], which breaks the end-to-end story and is time-consuming, or rely on extra human landmarks [72,22,71,14,54], requiring additional annotations or pre-trained pose estimators. Though [81] also performs multi-level, fine-grained parsing, it neither explores different information flows within human hierarchies nor models the problem from the view of multi-source information fusion.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…In particular, we first use a 1×1 convolution parameterized by V on f t+1 . Then we apply k t in a dynamic convolution layer [21], which is the same with traditional convolution layer, just replacing the pre-learned static convolution kernels with the dynamically learned ones. Finally, we adopt another 1×1 convolution with U to produce h t+1 .…”
Section: Network Architecturementioning
confidence: 99%