Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval 2020
DOI: 10.1145/3397271.3401430
|View full text |Cite
|
Sign up to set email alerts
|

FashionBERT: Text and Image Matching with Adaptive Loss for Cross-modal Retrieval

Abstract: In this paper, we address the text and image matching in crossmodal retrieval of the fashion industry. Different from the matching in the general domain, the fashion matching is required to pay much more aention to the fine-grained information in the fashion images and texts. Pioneer approaches detect the region of interests (i.e., RoIs) from images and use the RoI embeddings as image representations. In general, RoIs tend to represent the "object-level" information in the fashion images, while fashion texts … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
75
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
5
4
1

Relationship

0
10

Authors

Journals

citations
Cited by 98 publications
(77 citation statements)
references
References 46 publications
2
75
0
Order By: Relevance
“…There are several directions to extend the current work in the future, including (1) considering jointly modeling texts and images in one Transformer model like FashionBERT (Gao et al, 2020), and (2) using self-training to go beyond the limit caused by the size of labeled image data for the image model.…”
Section: Discussionmentioning
confidence: 99%
“…There are several directions to extend the current work in the future, including (1) considering jointly modeling texts and images in one Transformer model like FashionBERT (Gao et al, 2020), and (2) using self-training to go beyond the limit caused by the size of labeled image data for the image model.…”
Section: Discussionmentioning
confidence: 99%
“…The latest researches show that self-attention-based architectures, especially transformers [45], have achieved great success in the field of natural language processing. Inspired by this achievement, many researchers have applied transformers to help solve cross-modal retrieval tasks [16,27,30,53].…”
Section: Cross-modal Retrievalmentioning
confidence: 99%
“…Recently, with the advent of graph neural network, many methods based on this new paradigm have been proposed to learn graph representations on heterogeneous graphs, such as Heterogeneous Graph Neural Network (HetGNN) , Heterogeneous Graph Attention Network (HAN) (Wang et al, 2019c), and Heterogeneous Graph Transformer (HGT) (Hu et al, 2020). effectively increase the purchase rate of the top ranked products.…”
Section: Heterogeneous Networkmentioning
confidence: 99%