Proceedings of the 16th ACM Conference on Recommender Systems 2022
DOI: 10.1145/3523227.3546774
|View full text |Cite
|
Sign up to set email alerts
|

Multi-Modal Dialog State Tracking for Interactive Fashion Recommendation

Abstract: Multi-modal interactive recommendation is a type of task that allows users to receive visual recommendations and express naturallanguage feedback about the recommended items across multiple iterations of interactions. However, such multi-modal dialog sequences (i.e. turns consisting of the system's visual recommendations and the user's natural-language feedback) make it challenging to correctly incorporate the users' preferences across multiple turns. Indeed, the existing formulations of interactive recommende… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
12
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2
2
1

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(12 citation statements)
references
References 46 publications
0
12
0
Order By: Relevance
“…Multi-Modal Interactive Recommendation. Recently, multi-modal interactive recommendation has been intensively investigated in the literature, as it can satisfy the users' information needs by effectively eliciting the users' preferences from the visual recommendations (e.g., images of fashion products) and the corresponding verbal and/or non-verbal relevance feedback (e.g., natural-language feedback and likes/dislikes) [7,12,19,31,[45][46][47]53]. These kinds of interactive recommendations are suited for taste-oriented domains such as fashion, where search-type interaction methods are less useful.…”
Section: Related Workmentioning
confidence: 99%
See 4 more Smart Citations
“…Multi-Modal Interactive Recommendation. Recently, multi-modal interactive recommendation has been intensively investigated in the literature, as it can satisfy the users' information needs by effectively eliciting the users' preferences from the visual recommendations (e.g., images of fashion products) and the corresponding verbal and/or non-verbal relevance feedback (e.g., natural-language feedback and likes/dislikes) [7,12,19,31,[45][46][47]53]. These kinds of interactive recommendations are suited for taste-oriented domains such as fashion, where search-type interaction methods are less useful.…”
Section: Related Workmentioning
confidence: 99%
“…These kinds of interactive recommendations are suited for taste-oriented domains such as fashion, where search-type interaction methods are less useful. Typically, the multi-modal interactive recommendation task focuses on tracking and estimating the users' preferences over time with a state tracker, such as a gated recurrent unit (GRU) [19,45], a long short-term memory (LSTM) [57], a Transformer encoder [43,47], or an RNN-enhanced Transformer [46], in an end-to-end fashion with supervised learning (SL) and/or deep reinforcement learning (DRL) approaches. The representations of visual candidate items and natural-language feedback are initially generated with pretrained models (such as ResNet for image encoding and BERT or GloVe for text encoding), and are then implicitly further tuned along with the recommendation policy optimisation.…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations