2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2021
DOI: 10.1109/cvprw53098.2021.00439
|View full text |Cite
|
Sign up to set email alerts
|

Scalable and Explainable Outfit Generation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(3 citation statements)
references
References 4 publications
0
3
0
Order By: Relevance
“…Previous works have relied on visual, textual information and fashion categories for creating representations of garments in outfits. Transfer learning is generally being used for extracting visual information from the garment's images, either with feature extraction (FX) from ImageNet pretrained models [7,8] or by end-to-end fine-tuning (E2E) for OC b [9,2,10,1,11]. E2E tends to outperform FX-ImageNet since the visual features are trained to specialize on the target domain and task.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Previous works have relied on visual, textual information and fashion categories for creating representations of garments in outfits. Transfer learning is generally being used for extracting visual information from the garment's images, either with feature extraction (FX) from ImageNet pretrained models [7,8] or by end-to-end fine-tuning (E2E) for OC b [9,2,10,1,11]. E2E tends to outperform FX-ImageNet since the visual features are trained to specialize on the target domain and task.…”
Section: Related Workmentioning
confidence: 99%
“…We propose the Visual InCompatibility TransfORmer, or VICTOR, a multi-tasking, Transformer-based architecture that is trained to predict the overall OC r score and detect mismatching garments in an outfit. Previous works on OC b either rely on feature extraction from computer vision models pre-trained on ImageNet [7,8] or end-to-end (E2E) fine-tuning [9,2,10,1,11].…”
Section: Introductionmentioning
confidence: 99%
“…For our recommendation use-cases, we use a standard Transformer encoder [3,5,11] trained with casual language model (CLM) approach, although the approach is oblivious of the training logic and it can be trained with the masked language modeling (MLM) as well [3,10]. For the personalized outfit generation use-cases, we use variations of the standard encoder-decoder (sequence-to-sequence) Transformer architecture [11] as in [2,6]. The model is trained on the same source of sequence interactions.…”
Section: Model Architecturementioning
confidence: 99%