2016
DOI: 10.48550/arxiv.1609.01508
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Low-rank Bandits with Latent Mixtures

Abstract: We study the task of maximizing rewards from recommending items (actions) to users sequentially interacting with a recommender system. Users are modeled as latent mixtures of C many representative user classes, where each class specifies a mean reward profile across actions. Both the user features (mixture distribution over classes) and the item features (mean reward vector per class) are unknown a priori. The user identity is the only contextual information available to the learner while interacting. This ind… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
5
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
4

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(5 citation statements)
references
References 8 publications
0
5
0
Order By: Relevance
“…Contextual low-rank bandits. There has been some interest in settings similar to ours (Gentile et al, 2014;Gopalan et al, 2016;Pal & Jain, 2022;Lee et al, 2023), although mostly in the context of regret minimization. Nonetheless, progress on minimax guarantees has remained surprisingly limited.…”
Section: Related Workmentioning
confidence: 92%
See 1 more Smart Citation
“…Contextual low-rank bandits. There has been some interest in settings similar to ours (Gentile et al, 2014;Gopalan et al, 2016;Pal & Jain, 2022;Lee et al, 2023), although mostly in the context of regret minimization. Nonetheless, progress on minimax guarantees has remained surprisingly limited.…”
Section: Related Workmentioning
confidence: 92%
“…Misspecified contextual linear bandits. Misspecified contextual linear bandits have recently received a lot of attention (Gopalan et al, 2016;Ghosh et al, 2017;Foster et al, 2020;Zanette et al, 2020;Takemura et al, 2021). The best achievable minimax regret bounds for this setting are O(d…”
Section: A Additional Related Workmentioning
confidence: 99%
“…Finally, if the corruption f p¨q is only a function of the context then it is possible to do much better (Krishnamurthy et al, 2018). This surprising connection with the popular LINUCB makes ELEANOR (or LINUCB with a correction on the exploration bonus) the first algorithm capable of handling misspecified contextual linear bandits, although we are not the first to consider misspecification in linear bandits per se: (Ghosh et al, 2017) propose an algorithm that switches to tabular if misspecification is detected and (Gopalan et al, 2016) consider the case that the misspecification is less than roughly the action gap; (Van Roy & Dong, 2019) comment on the lower bound by (Du et al, 2019) using the Eluder dimension. Finally, (Lattimore & Szepesvari, 2019) have recently obtained a result similar to ours, but for a different setting.…”
Section: Contextual Misspecified Linear Banditsmentioning
confidence: 98%
“…MLCBs: In the special case of MLCBs (H = 1), Gopalan et al [2016] show (unmodified) Lin-UCB can achieve sublinear regret ε mis is very small. Ghosh et al [2017], Foster et al [2021 provide regret bounds more generally but they scale with |A| (we seek regret bounds independent of |A|, as is standard in the LMDP literature).…”
Section: Model Selectionmentioning
confidence: 98%