Neural Interactive Collaborative Filtering

Zou, Lixin; Xia, Long; Gu, Yun; Zhao, Xiangyu; Liu, Weidong; Huang, Jimmy Xiangji; Yin, Dawei

doi:10.1145/3397271.3401181

Cited by 108 publications

(47 citation statements)

References 38 publications

(58 reference statements)

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Users' long-term engagement in a recommender system has recently attracted increasing attention [23][24][25]. To capturing such information, reinforcement learning, as a powerful tool for balancing short-and long-term rewards, has became an interesting framework for building recommender models.…”

Section: Related Work 41 Rl-based Recommender Modelsmentioning

confidence: 99%

Reinforcement Recommendation with User Multi-aspect Preference

Chen

Xia³

et al. 2021

Proceedings of the Web Conference 2021

Self Cite

View full text Add to dashboard Cite

Formulating recommender system with reinforcement learning (RL) frameworks has attracted increasing attention from both academic and industry communities. While many promising results have been achieved, existing models mostly simulate the environment reward with a unified value, which may hinder the understanding of users' complex preferences and limit the model performance. In this paper, we consider how to model user multi-aspect preferences in the context of RL-based recommender system. More specifically, we base our model on the framework of deterministic policy gradient (DPG), which is effective in dealing with large action spaces. A major challenge for modeling user multi-aspect preferences lies in the fact that they may contradict with each other. To solve this problem, we introduce Pareto optimization into the DPG framework. We assign each aspect with a tailored critic, and all the critics share the same actor. The Pareto optimization is realized by a gradient-based method, which can be easily integrated into the actor and critic learning process. Based on the designed model, we theoretically analyze its gradient bias in the optimization process, and we design a weight-reuse mechanism to lower the upper bound of this bias, which is shown to be effective for improving the model performance. We conduct extensive experiments based on three real-world datasets to demonstrate our model's superiorities.

show abstract

Section: Related Work 41 Rl-based Recommender Modelsmentioning

confidence: 99%

Reinforcement Recommendation with User Multi-aspect Preference

Chen

Xia³

et al. 2021

Proceedings of the Web Conference 2021

Self Cite

View full text Add to dashboard Cite

show abstract

“…Biswas et al [3] extended this approach to interactive product search. More recently, multi-armed bandit approaches to conversational recommendation [43,45] leverage exploration-exploitation algorithms to maximize the information learned from feedback at each turn.…”

Section: Conversational Recommendationmentioning

confidence: 99%

Self-Supervised Bot Play for Conversational Recommendation with Justifications

Li¹,

Majumder²,

McAuley³

2021

Preprint

View full text Add to dashboard Cite

System (re-)scores candidate items using user preference embedding System suggests top-scoring item and generates a justification The Eye of the World The Hobbit The Last Unicorn Assassin's Apprentice "You might like The Eye of the World. It's a complex high fantasy novel about politics." User accepts the suggestion or User critiques an aspect from the justification "I don't really care for politics" System updates user preference embedding via critique Figure 1: Conversational critiquing workflow. The system scores candidate items and generates a justification for the top item.If the user rejects the suggestion and critiques an aspect, the system uses the critique to update the latent user representation.

show abstract

“…Recently, interactive recommender systems (IRS) have received much attention due to their flexible recommendation strategies and their natural multi-step decision-making processes. A typical interactive recommender system continuously recommends items to users and receives various types of users' feedback, such as clicks, ratings, or textual replies [7,28,37,40]. In particular, naturallanguage feedback allows an interactive recommender system to obtain richer information relating to the users' current preferences, thereby leading to a more suitable recommendation compared to clickthrough data and ratings [36].…”

Section: Introductionmentioning

confidence: 99%

Partially Observable Reinforcement Learning for Dialog-based Interactive Recommendation

Macdonald

Ounis

2021

Fifteenth ACM Conference on Recommender Systems

View full text Add to dashboard Cite

A dialog-based interactive recommendation task is where users can express natural-language feedback when interacting with the recommender system. However, the users' feedback, which takes the form of natural-language critiques about the recommendation at each iteration, can only allow the recommender system to obtain a partial portrayal of the users' preferences. Indeed, such partial observations of the users' preferences from their natural-language feedback make it challenging to correctly track the users' preferences over time, which can result in poor recommendation performances and a less effective satisfaction of the users' information needs when in presence of limited iterations. Reinforcement learning, in the form of a partially observable Markov decision process (POMDP), can simulate the interactions between a partially observable environment (i.e. a user) and an agent (i.e. a recommender system). To alleviate such a partial observation issue, we propose a novel dialogbased recommendation model, the Estimator-Generator-Evaluator (EGE) model, with Q-learning for POMDP, to effectively incorporate the users' preferences over time. Specifically, we leverage an Estimator to track and estimate users' preferences, a Generator to match the estimated preferences with the candidate items to rank the next recommendations, and an Evaluator to judge the quality of the estimated preferences considering the users' historical feedback. Following previous work, we train our EGE model by using a user simulator which itself is trained to describe the differences between the target users' preferences and the recommended items in natural language. Thorough and extensive experiments conducted on two recommendation datasets -addressing images of fashion products (namely dresses and shoes) -demonstrate that our proposed EGE model yields significant improvements in comparison to the existing state-of-the-art baseline models.

show abstract

Neural Interactive Collaborative Filtering

Cited by 108 publications

References 38 publications

Reinforcement Recommendation with User Multi-aspect Preference

Reinforcement Recommendation with User Multi-aspect Preference

Self-Supervised Bot Play for Conversational Recommendation with Justifications

Partially Observable Reinforcement Learning for Dialog-based Interactive Recommendation

Contact Info

Product

Resources

About