Revenue Maximization and Learning in Products Ranking

Chen, Ningyuan; Li, Anran; Yang, Shuoguang

doi:10.48550/arxiv.2012.03800

Cited by 2 publications

(2 citation statements)

References 49 publications

(53 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Furthermore, OIM falls in the Combinatorial Multi-Armed Bandit (CMAB) regime, where each arm is a combination of multiple selected items. CMAB has found widespread applications in the real-world, such as assortment optimization (Agrawal et al 2019, Oh and Iyengar 2021, and product ranking (Chen et al 2020). Most CMAB models assume semi-bandit feedback, where the agent can observe the outcome of every item within the selected arm (Gai et al 2012, Chen et al 2013, Kveton et al 2015c, Chen et al 2016b, Wang and Chen 2017, or partial-feedback, where the agent can only observe a subset of the selected items (Kveton et al 2015b, Combes et al 2015, Katariya et al 2016, Zong et al 2016, Cheung et al 2019.…”

Section: Literature Reviewsmentioning

confidence: 99%

Online Learning of Independent Cascade Models with Node-level Feedback

Yang¹,

Truong²

2021

Preprint

Self Cite

View full text Add to dashboard Cite

We propose a detailed analysis of the online-learning problem for Independent Cascade (IC) models under node-level feedback. These models have widespread applications in modern social networks. Existing works for IC models have only shed light on edge-level feedback models, where the agent knows the explicit outcome of every observed edge. Little is known about node-level feedback models, where only combined outcomes for sets of edges are observed; in other words, the realization of each edge is censored. This censored information, together with the nonlinear form of the aggregated influence probability, make both parameter estimation and algorithm design challenging. We establish the first confidence-region result under this setting. We also develop an online algorithm achieving a cumulative regret of Õ( √ T ), matching the theoretical regret bound for IC models with edge-level feedback.

show abstract

Section: Literature Reviewsmentioning

confidence: 99%

Online Learning of Independent Cascade Models with Node-level Feedback

Yang¹,

Truong²

2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…The work in (Kveton, Szepesvari, Wen, & Ashkan, 2015;Cao, Sun, & Shen, 2019;Chen, Li, & Yang, 2020) studies variants of sequential choice bandit model without feedback. Unlike our setting, the sequence of action is pre-determined at the arrival of each user, independently from the user's feedback.…”

Section: Related Workmentioning

confidence: 99%

Sequential Choice Bandits with Feedback for Personalizing users' experience

Rangi,

Franceschetti,

Tran-Thanh

2021

Preprint

View full text Add to dashboard Cite

In this work, we study sequential choice bandits with feedback. We propose bandit algorithms for a platform that personalizes users' experience to maximize its rewards. For each action directed to a given user, the platform is given a positive reward, which is a non-decreasing function of the action, if this action is below the user's threshold. Users are equipped with a patience budget, and actions that are above the threshold decrease the user's patience. When all patience is lost, the user abandons the platform. The platform attempts to learn the thresholds of the users in order to maximize its rewards, based on two different feedback models describing the information pattern available to the platform at each action. We define a notion of regret by determining the best action to be taken when the platform knows that the user's threshold is in a given interval. We then propose bandit algorithms for the two feedback models and show that upper and lower bounds on the regret are of the order of Õ(N 2/3 ) and Ω(N 2/3 ), respectively, where N is the total number of users. Finally, we show that the waiting time of any user before receiving a personalized experience is uniform in N .

show abstract

Revenue Maximization and Learning in Products Ranking

Cited by 2 publications

References 49 publications

Online Learning of Independent Cascade Models with Node-level Feedback

Online Learning of Independent Cascade Models with Node-level Feedback

Sequential Choice Bandits with Feedback for Personalizing users' experience

Contact Info

Product

Resources

About