2020
DOI: 10.48550/arxiv.2006.02948
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Low-Rank Generalized Linear Bandit Problems

Abstract: In a low-rank linear bandit problem, the reward of an action (represented by a matrix of size d1 × d2) is the inner product between the action and an unknown low-rank matrix Θ * . We propose an algorithm based on a novel combination of online-to-confidence-set conversion (Abbasi-Yadkori et al., 2012) and the exponentially weighted average forecaster constructed by a covering of low-rank matrices. In T rounds, our algorithm achieves O((d1 + d2) 3/2 √ rT ) regret that improves upon the standard linear bandit reg… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3

Citation Types

0
3
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 22 publications
0
3
0
Order By: Relevance
“…Our setting is also related to the recent line of work on low-rank bandits [Lale et al, 2019;Lu et al, 2020;Jun et al, 2019;Lattimore and Hao, 2021;, because our formulation also admits a low-rank structure. However, the approach we use and their are very different.…”
Section: Related Workmentioning
confidence: 99%
“…Our setting is also related to the recent line of work on low-rank bandits [Lale et al, 2019;Lu et al, 2020;Jun et al, 2019;Lattimore and Hao, 2021;, because our formulation also admits a low-rank structure. However, the approach we use and their are very different.…”
Section: Related Workmentioning
confidence: 99%
“…The mean reward in their setting is defined as the bilinear multiplication x ⊤ Θy, where x and y are two actions selected at each step, and Θ is an unknown parameter matrix with low rank. Their setting is further generalized by Lu et al (2020). Furthermore, sparse linear bandits can be regarded as a simplified setting, where B is a binary matrix indicating the subset of relevant features in context x (Abbasi- Yadkori et al, 2012;Carpentier and Munos, 2012;Lattimore et al, 2015;Hao et al, 2020).…”
Section: Related Workmentioning
confidence: 99%
“…Their proposed algorithm shares some similarity as our algorithm for the infinite-action setting in that they added an exploration stage to extract the low-rank structure of Θ. Their setting is further generalized and studied by Lu et al [2020].…”
Section: Related Workmentioning
confidence: 99%