2022
DOI: 10.48550/arxiv.2203.13423
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Modeling Attrition in Recommender Systems with Departing Bandits

Abstract: Traditionally, when recommender systems are formalized as multi-armed bandits, the policy of the recommender system influences the rewards accrued, but not the length of interaction. However, in real-world systems, dissatisfied users may depart (and never come back). In this work, we propose a novel multi-armed bandit setup that captures such policydependent horizons. Our setup consists of a finite set of user types, and multiple arms with Bernoulli payoffs. Each (user type, arm) tuple corresponds to an (unkno… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1
1

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(7 citation statements)
references
References 16 publications
0
7
0
Order By: Relevance
“…These results suggest an algorithm that achieves Õ( √ T ) regret in this setting. In the full version of this paper (Ben-Porat et al 2022), we also show an efficient optimal planning algorithm for multiple user types and two recommendation categories, and describe a scheme to construct semi-synthetic problem instances for this setting using real-world datasets.…”
Section: √ T ) Regret For T Beingmentioning
confidence: 99%
See 1 more Smart Citation
“…These results suggest an algorithm that achieves Õ( √ T ) regret in this setting. In the full version of this paper (Ben-Porat et al 2022), we also show an efficient optimal planning algorithm for multiple user types and two recommendation categories, and describe a scheme to construct semi-synthetic problem instances for this setting using real-world datasets.…”
Section: √ T ) Regret For T Beingmentioning
confidence: 99%
“…These lemmas allow us to use Algorithm 2 with a policy set Π that contains all the fixed-arm policies, and derive a Õ( √ T ) regret bound. All omitted proofs can be found in the full version of this paper (Ben-Porat et al 2022).…”
Section: Single User Typementioning
confidence: 99%
“…In contrast, in our MAB-A setting, the reward is unknown and a higher reward makes the user less likely to abandon the system. The concept of abandonment also appears in the sequential choice bandit problem in [4] and the departing bandit problem in [2]. However, the abandonment probabilities in their models do not depend on the past experience of the user.…”
Section: Related Workmentioning
confidence: 99%
“…Hence by Theorem 10 in [7], we have P `µpa i q ă μ0 t pa i q ˘ď e rrlog t `4 logplog tqs log ts tplog tq 4 ď 6e tplog tq 2 (91) for any t ě T 1 . Hence, we have…”
Section: C5 Proof Of Lemmamentioning
confidence: 99%
See 1 more Smart Citation