Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining 2022
DOI: 10.1145/3534678.3539040
|View full text |Cite
|
Sign up to set email alerts
|

Multi-Task Fusion via Reinforcement Learning for Long-Term User Satisfaction in Recommender Systems

Abstract: Recommender System (RS) is an important online application that affects billions of users every day. The mainstream RS ranking framework is composed of two parts: a Multi-Task Learning model (MTL) that predicts various user feedback, i.e., clicks, likes, sharings, and a Multi-Task Fusion model (MTF) that combines the multi-task outputs into one final ranking score with respect to user satisfaction. There has not been much research on the fusion model while it has great impact on the final recommendation as the… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
10
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 30 publications
(10 citation statements)
references
References 30 publications
0
10
0
Order By: Relevance
“…In the future, when fitting user interests is not a bottleneck anymore, researchers could consider higher-level goals, such as pursuing users' long-term satisfaction [57] or optimizing social utility [17]. With the increase in high-quality offline data, we believe that offline RL can be better adapted to recommender systems to achieve these goals.…”
Section: Discussionmentioning
confidence: 99%
“…In the future, when fitting user interests is not a bottleneck anymore, researchers could consider higher-level goals, such as pursuing users' long-term satisfaction [57] or optimizing social utility [17]. With the increase in high-quality offline data, we believe that offline RL can be better adapted to recommender systems to achieve these goals.…”
Section: Discussionmentioning
confidence: 99%
“…However, for a fair comparison to LabelCraft, we merge all explicit feedback as a label by computing 𝛿 ( (𝒚 𝑒 )) in Equation ( 9). • PC [38]. In this method, the watch time of a video is compared to the video duration to determine whether a user has fully watched the video, forming the Play Completion label.…”
Section: Experimental Settingsmentioning
confidence: 99%
“…By setting user-item features as state and continuous score pairs for multiple tasks as actions, the RL-based MTL method is capable of handling the sequential user-item interaction and optimizing long-term user engagement. Zhang et al [79] formulate MTF as Markov Decision Process and use batch Reinforcement Learning to optimize long-term user satisfaction. Han et al [22] propose to use an actor-critic model to learn the optimal fusion weight of CTR and the bid rather than greedy ranking strategies to maximize the long-term revenue.…”
Section: Optimizationmentioning
confidence: 99%
“…On the other hand, social media is a more complex field since the users interact with both items and users. Multiple MTDRS models validate their effectiveness on social media by online A/B test, including MMoE [84] on YouTube considering engagement and satisfaction, LT4REC [69] on Tencent Video, and BatchRL-MTF [79] on Tencent short video platform.…”
Section: Application Fieldsmentioning
confidence: 99%
See 1 more Smart Citation