2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition 2018
DOI: 10.1109/cvpr.2018.00449
|View full text |Cite
|
Sign up to set email alerts
|

Attentive Fashion Grammar Network for Fashion Landmark Detection and Clothing Category Classification

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
173
0
1

Year Published

2018
2018
2022
2022

Publication Types

Select...
4
2
1

Relationship

2
5

Authors

Journals

citations
Cited by 238 publications
(174 citation statements)
references
References 35 publications
0
173
0
1
Order By: Relevance
“…LGR achieves an obvious improvement over the two large datasets compared with PyraNet [32], FashionNet [36], DFA [19], DLAN [31] and BCRNNs [27]. Note that PyraNet is human pose estimation model with two stages.…”
Section: Comparison With the State-of-the-artsmentioning
confidence: 96%
See 3 more Smart Citations
“…LGR achieves an obvious improvement over the two large datasets compared with PyraNet [32], FashionNet [36], DFA [19], DLAN [31] and BCRNNs [27]. Note that PyraNet is human pose estimation model with two stages.…”
Section: Comparison With the State-of-the-artsmentioning
confidence: 96%
“…FFLD is our contributed fine-grained fashion landmark dataset, which contains 200k images annotated with at most 32 key-points and bounding boxes for 13 clothes categories. Following [27], 209,222 fashion images are used for training; 40, 000 images are used for validation and remaining 40, 000 images are for testing in DeepFashion. Following the protocol in FLD [19], 83, 033 images and 19, 992 fashion images are used for training and validating, 19, 991 images are used for testing.…”
Section: Experimental Settingsmentioning
confidence: 99%
See 2 more Smart Citations
“…They mimic the human cognitive attention mechanism, which selectively focuses on the most visually informative parts of a scene. They were first explored in neural machine translation [2], and later proved effective in various natural language processing and computer vision tasks, such as image captioning [50], question answering [57], scene recognition [4,41], fashion analysis [43], etc. In the above studies, an attention mechanism is learned in a goal-driven, end-to-end manner, allowing the network to concentrate on the most task-relevant parts of the inputs.…”
Section: Related Workmentioning
confidence: 99%