Modeling Attrition in Recommender Systems with Departing Bandits

Ben-Porat, Omer; Cohen, Lee M.; Leqi, Liu; Mansour, Yishay

doi:10.1609/aaai.v36i6.20554

Cited by 4 publications

(6 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…From a broader perspective, our work relates to bandits with complex rewards schemes, i.e., abandonment elements [3,11,44], and non-stationary rewards [4,22,26,27,34,38]. Our work is also related to multi-stakeholder recommendation systems [5,6,9,30,39,45] and fairness in machine learning [12,29,43].…”

Section: Related Workmentioning

confidence: 99%

“…Notice that by the end of the exploration stage, all arms are still viable. To see this, recall that our assumption in Inequality (3) suggests that at some point in every phase, all arms a will be pulled at least max(δ a , γτ ) times, which is by definition greater than the exposure constraint δ a .…”

Section: Meta Algorithmmentioning

confidence: 99%

“…We further denote by x u,a the number of rounds that a ∈ K is pulled for type u-users during the exploration stage. The arm a is pulled at least γ • τ rounds in a phase (see Inequality (3) and Lines 3-4 in Algorithm 3) and thus, at least T 2 /3 times during the exploration stage. In addition, in the exploration stage the pulled arm a t is independent of the arrival type u t ; therefore, x u,a ∼ bin(N, P u ), for N ≥ T 2 /3 .…”

Section: A8 Proof Of Lemmamentioning

confidence: 99%

“…However, due to their rapid adoption in commercial applications, many RSs are now dynamic economic systems with multiple stakeholders, facing challenges beyond dissolving uncertainty in matching. Fairness [6,15,18,33], misinformation [17], user incentives [3,24], and privacy [21] are only some of the challenges RSs face.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Learning with Exposure Constraints in Recommendation Systems

Ben-Porat¹,

Rotem²

2023

Preprint

View full text Add to dashboard Cite

Recommendation systems are dynamic economic systems that balance the needs of multiple stakeholders. A recent line of work studies incentives from the content providers' point of view. Content providers, e.g., vloggers and bloggers, contribute fresh content and rely on user engagement to create revenue and finance their operations. In this work, we propose a contextual multi-armed bandit setting to model the dependency of content providers on exposure. In our model, the system receives a user context in every round and has to select one of the arms. Every arm is a content provider who must receive a minimum number of pulls every fixed time period (e.g., a month) to remain viable in later rounds; otherwise, the arm departs and is no longer available. The system aims to maximize the users' (content consumers) welfare. To that end, it should learn which arms are vital and ensure they remain viable by subsidizing arm pulls if needed. We develop algorithms with sub-linear regret, as well as a lower bound that demonstrates that our algorithms are optimal up to logarithmic factors.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Meta Algorithmmentioning

confidence: 99%

Section: A8 Proof Of Lemmamentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Learning with Exposure Constraints in Recommendation Systems

Ben-Porat¹,

Rotem²

2023

Preprint

View full text Add to dashboard Cite

show abstract

“…Several papers have proposed multi-armed bandit models where surrogate outcomes encode actions' longterm impacts. These include bandit models where poor recommendations cause attrition [Ben-Porat et al, 2022, Bastani et al, 2022 and bandit models where objectives incorporate diversity/boredom considerations [Xie et al, 2022, Cao et al, 2020, Ma et al, 2016. Wu et al [2017] studies a variation on typical bandit model where actions impact whether a user will return to the system.…”

Section: Surrogate Outcomes and Proxy-metricsmentioning

confidence: 99%

Optimizing Audio Recommendations for the Long-Term: A Reinforcement Learning Perspective

Maystre¹,

Russo²,

Yu³

2023

Preprint

View full text Add to dashboard Cite

We study the problem of optimizing a recommender system for outcomes that occur over several weeks or months. We begin by drawing on reinforcement learning to formulate a comprehensive model of users' recurring relationships with a recommender system. Measurement, attribution, and coordination challenges complicate algorithm design. We describe careful modeling-including a new representation of user state and key conditional independence assumptions-which overcomes these challenges and leads to simple, testable recommender system prototypes. We apply our approach to a podcast recommender system that makes personalized recommendations to hundreds of millions of listeners. A/B tests demonstrate that purposefully optimizing for long-term outcomes leads to large performance gains over conventional approaches that optimize for short-term proxies.

show abstract

Fair Incentives for Repeated Engagement

Freund,

Hssaine

2024

Production and Operations Management

View full text Add to dashboard Cite

We study a decision-maker’s problem of finding optimal monetary incentive schemes for retention when faced with agents whose participation decisions (stochastically) depend on the incentive they receive. Our focus is on policies constrained to fulfill two fairness properties that preclude outcomes wherein different groups of agents experience different treatment on average. We formulate the problem as a high-dimensional stochastic optimization problem, and study it through the use of a closely related deterministic variant. We show that the optimal static solution to this deterministic variant is asymptotically optimal for the dynamic problem under fairness constraints. Though solving for the optimal static solution gives rise to a non-convex optimization problem, we uncover a structural property that allows us to design a tractable, fast-converging heuristic policy. Traditional schemes for retention ignore fairness constraints; indeed, the goal in these is to use differentiation to incentivize repeated engagement with the system. Our work ( i) shows that even in the absence of explicit discrimination, dynamic policies may unintentionally discriminate between agents of different types by varying the type composition of the system, and ( ii) presents an asymptotically optimal policy to avoid such discriminatory outcomes.

show abstract

Modeling Attrition in Recommender Systems with Departing Bandits

Cited by 4 publications

References 19 publications

Learning with Exposure Constraints in Recommendation Systems

Learning with Exposure Constraints in Recommendation Systems

Optimizing Audio Recommendations for the Long-Term: A Reinforcement Learning Perspective

Fair Incentives for Repeated Engagement

Contact Info

Product

Resources

About