Learning Fashion Compatibility with Bidirectional LSTMs

Han, Xintong; Wu, Zuxuan; Jiang, Yu-Gang; Davis, Larry S.

doi:10.1145/3123266.3123394

Cited by 336 publications

(410 citation statements)

References 37 publications

Supporting

Mentioning

408

Contrasting

Order By: Relevance

“…Recommendation and Retrieval. Similarity learning has also been used extensively to solve computer vision problems in other domains such as fashion and retail (e.g., [12,35,37]). Using visual attributes is a naturally intuitive way to describe fashion items (e.g.…”

Section: Related Workmentioning

confidence: 99%

“…As such, identifying relevant attributes in visual representations of fashion items is essential to reasoning about similarity between them. The deficiency of comparing images by projecting them into a general embedding space as described above is especially apparent in prior work on modeling fashion outfit compatibility [21,12,35,37]. In their approach, Veit et al [37] do not distinguish items by their types but instead attempt to learn the concepts of compatibility and similarity from heterogeneous dyadic cooccurrences of items in user data.…”

Section: Related Workmentioning

confidence: 99%

“…We evaluate the capability of the SCE-Net model to capture different notions of similarity as well as how well it generalizes to novel image categories that are not seen during the training process. To provide a fair comparison 2 of our approach to other baseline models, we perform experiments on the Maryland-Polyvore [12], Polyvore-Outfits [35] and UT-Zappos50k [43] datasets. The Maryland Polyvore and Polyvore Outfits datasets contain two evaluation tasks -outfit compatibility prediction and fill-in-theblank (FITB).…”

Section: Experimental Analysismentioning

confidence: 99%

“…We perform extensive experiments over three datasets, Polyvore-Outfits [35], Maryland-Polyvore [12] and UT-Zappos50K [43], where our approach outperforms the stateof-the-art in outfit compatibility prediction, fill-in-the-blank outfit completion and triplet prediction tasks, respectively, without requiring strong supervision (via category or attribute labels) used in prior work at test time.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Learning Similarity Conditions Without Explicit Supervision

Tan

Vasileva

Saenko

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

117

View full text Add to dashboard Cite

Many real-world tasks require models to compare images along multiple similarity conditions (e.g. similarity in color, category or shape). Existing methods often reason about these complex similarity relationships by learning condition-aware embeddings. While such embeddings aid models in learning different notions of similarity, they also limit their capability to generalize to unseen categories since they require explicit labels at test time. To address this deficiency, we propose an approach that jointly learns representations for the different similarity conditions and their contributions as a latent variable without explicit supervision. Comprehensive experiments 1 across three datasets, Polyvore-Outfits, Maryland-Polyvore and UT-Zappos50k, demonstrate the effectiveness of our approach: our model outperforms the state-of-the-art methods, even those that are strongly supervised with pre-defined similarity conditions, on fill-in-the-blank, outfit compatibility prediction and triplet prediction tasks. Finally, we show that our model learns different visually-relevant semantic sub-spaces that allow it to generalize well to unseen categories.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Experimental Analysismentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Learning Similarity Conditions Without Explicit Supervision

Tan

Vasileva

Saenko

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

117

View full text Add to dashboard Cite

show abstract

“…Li et al [26] use a Recurrent Neural Network (RNN) to predict whether an outfit is popular, which also implicitly learns the compatibility relation between fashion items. Han et al [11] further train a Bi-LSTM to sequentially predict the next item conditioned on the previous ones for learning their compatibility relationship. Song et al [41] employ a dual auto-encoder network to learn the latent compatibility space where they use the BPR model to jointly model the relation between visual and contextual modalities and implicit preferences among fashion items.…”

Section: Visual Matchingmentioning

confidence: 99%

Improving Outfit Recommendation with Co-supervision of Fashion Generation

Lin

Ren

Chen

et al. 2019

The World Wide Web Conference

View full text Add to dashboard Cite

The task of fashion recommendation includes two main challenges: visual understanding and visual matching. Visual understanding aims to extract effective visual features. Visual matching aims to model a human notion of compatibility to compute a match between fashion items. Most previous studies rely on recommendation loss alone to guide visual understanding and matching. Although the features captured by these methods describe basic characteristics (e.g., color, texture, shape) of the input items, they are not directly related to the visual signals of the output items (to be recommended). This is problematic because the aesthetic characteristics (e.g., style, design), based on which we can directly infer the output items, are lacking. Features are learned under the recommendation loss alone, where the supervision signal is simply whether the given two items are matched or not.To address this problem, we propose a neural co-supervision learning framework, called the FAshion Recommendation Machine (FARM). FARM improves visual understanding by incorporating the supervision of generation loss, which we hypothesize to be able to better encode aesthetic information. FARM enhances visual matching by introducing a novel layer-to-layer matching mechanism to fuse aesthetic information more effectively, and meanwhile avoiding paying too much attention to the generation quality and ignoring the recommendation performance.Extensive experiments on two publicly available datasets show that FARM outperforms state-of-the-art models on outfit recommendation, in terms of AUC and MRR. Detailed analyses of generated and recommended items demonstrate that FARM can encode better features and generate high quality images as references to improve recommendation performance.

show abstract