Modeling Attrition in Recommender Systems with Departing Bandits

Ben-Porat, Omer; Cohen, Lee S.; Leqi, Liu; Mansour, Yishay

doi:10.48550/arxiv.2203.13423

Cited by 2 publications

(7 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…These results suggest an algorithm that achieves Õ( √ T ) regret in this setting. In the full version of this paper (Ben-Porat et al 2022), we also show an efficient optimal planning algorithm for multiple user types and two recommendation categories, and describe a scheme to construct semi-synthetic problem instances for this setting using real-world datasets.…”

Section: √ T ) Regret For T Beingmentioning

confidence: 99%

See 1 more Smart Citation

Modeling Attrition in Recommender Systems with Departing Bandits

Ben-Porat

Cohen

Leqi

et al. 2022

AAAI

Self Cite

View full text Add to dashboard Cite

Traditionally, when recommender systems are formalized as multi-armed bandits, the policy of the recommender system influences the rewards accrued, but not the length of interaction. However, in real-world systems, dissatisfied users may depart (and never come back). In this work, we propose a novel multi-armed bandit setup that captures such policy-dependent horizons. Our setup consists of a finite set of user types, and multiple arms with Bernoulli payoffs. Each (user type, arm) tuple corresponds to an (unknown) reward probability. Each user's type is initially unknown and can only be inferred through their response to recommendations. Moreover, if a user is dissatisfied with their recommendation, they might depart the system. We first address the case where all users share the same type, demonstrating that a recent UCB-based algorithm is optimal. We then move forward to the more challenging case, where users are divided among two types. While naive approaches cannot handle this setting, we provide an efficient learning algorithm that achieves O(sqrt(T)ln(T)) regret, where T is the number of users.

show abstract

Section: √ T ) Regret For T Beingmentioning

confidence: 99%

“…These lemmas allow us to use Algorithm 2 with a policy set Π that contains all the fixed-arm policies, and derive a Õ( √ T ) regret bound. All omitted proofs can be found in the full version of this paper (Ben-Porat et al 2022).…”

Section: Single User Typementioning

confidence: 99%

Modeling Attrition in Recommender Systems with Departing Bandits

Ben-Porat

Cohen

Leqi

et al. 2022

AAAI

Self Cite

View full text Add to dashboard Cite

show abstract

“…In contrast, in our MAB-A setting, the reward is unknown and a higher reward makes the user less likely to abandon the system. The concept of abandonment also appears in the sequential choice bandit problem in [4] and the departing bandit problem in [2]. However, the abandonment probabilities in their models do not depend on the past experience of the user.…”

Section: Related Workmentioning

confidence: 99%

“…Hence by Theorem 10 in [7], we have P `µpa i q ă μ0 t pa i q ˘ď e rrlog t `4 logplog tqs log ts tplog tq 4 ď 6e tplog tq 2 (91) for any t ě T 1 . Hence, we have…”

Section: C5 Proof Of Lemmamentioning

confidence: 99%

“…Bernoulli rewards of pulling arm a i , and the last inequality is due to the fact that tA t " a i , N t pa i q " nu 8 t"n`1 are mutually exclusive and the countable additivity of probability measure. Then by Hoeffding inequality and (108), we have 8 ÿ t"M `1 P `At " a i , N t pa i q ě Bptq, μ1 t pa i q ě µpa 1 q ď 8 ÿ n"1 exp ´´2n pr pa i q ´µpa i qq 2 ď 1 2 pr pa i q ´µpa i qq 2 (109)…”

Section: C6 Proof Of Lemmamentioning

confidence: 99%

See 1 more Smart Citation

Exploration, Exploitation, and Engagement in Multi-Armed Bandits with Abandonment

Yang¹,

Liu²,

Lü³

2022

Preprint

View full text Add to dashboard Cite

Multi-armed bandit (MAB) is a classic model for understanding the exploration-exploitation trade-off. The traditional MAB model for recommendation systems assumes the user stays in the system for the entire learning horizon. In new online education platforms such as ALEKS or new video recommendation systems such as TikTok and YouTube Shorts, the amount of time a user spends on the app depends on how engaging the recommended contents are. Users may temporarily leave the system if the recommended items cannot engage the users. To understand the exploration, exploitation, and engagement in these systems, we propose a new model, called MAB-A where "A" stands for abandonment and the abandonment probability depends on the current recommended item and the user's past experience (called state). We propose two algorithms, ULCB and KL-ULCB, both of which do more exploration (being optimistic) when the user likes the previous recommended item and less exploration (being pessimistic) when the user does not like the previous item. We prove that both ULCB and KL-ULCB achieve logarithmic regret, Oplog Kq, where K is the number of visits (or episodes). Furthermore, the regret bound under KL-ULCB is asymptotically sharp. We also extend the proposed algorithms to the general-state setting. Simulation results confirm our theoretical analysis and show that the proposed algorithms have significantly lower regrets than the traditional UCB and KL-UCB, and Q-learning-based algorithms.

show abstract

Modeling Attrition in Recommender Systems with Departing Bandits

Cited by 2 publications

References 16 publications

Modeling Attrition in Recommender Systems with Departing Bandits

Modeling Attrition in Recommender Systems with Departing Bandits

Exploration, Exploitation, and Engagement in Multi-Armed Bandits with Abandonment

Contact Info

Product

Resources

About