Low-Rank Generalized Linear Bandit Problems

Lu, Yangyi; Meisami, Amirhossein; Tewari, Ambuj

doi:10.48550/arxiv.2006.02948

Cited by 3 publications

(3 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Our setting is also related to the recent line of work on low-rank bandits [Lale et al, 2019;Lu et al, 2020;Jun et al, 2019;Lattimore and Hao, 2021;, because our formulation also admits a low-rank structure. However, the approach we use and their are very different.…”

Section: Related Workmentioning

confidence: 99%

Nearly Minimax Algorithms for Linear Bandits with Shared Representation

Yang¹,

Liu²,

Lee³

et al. 2022

Preprint

View full text Add to dashboard Cite

We give novel algorithms for multi-task and lifelong linear bandits with shared representation. Specifically, we consider the setting where we play M linear bandits with dimension d, each for T rounds, and these M bandit tasks share a common k( d) dimensional linear representation. For both the multi-task setting where we play the tasks concurrently, and the lifelong setting where we play tasks sequentially, we come up with novel algorithms that achieve O d √ kM T + kM √ T regret bounds, which matches the known minimax regret lower bound up to logarithmic factors and closes the gap in existing results . Our main technique include a more efficient estimator for the low-rank linear feature extractor and an accompanied novel analysis for this estimator.

show abstract

Section: Related Workmentioning

confidence: 99%

Nearly Minimax Algorithms for Linear Bandits with Shared Representation

Yang¹,

Liu²,

Lee³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…The mean reward in their setting is defined as the bilinear multiplication x ⊤ Θy, where x and y are two actions selected at each step, and Θ is an unknown parameter matrix with low rank. Their setting is further generalized by Lu et al (2020). Furthermore, sparse linear bandits can be regarded as a simplified setting, where B is a binary matrix indicating the subset of relevant features in context x (Abbasi- Yadkori et al, 2012;Carpentier and Munos, 2012;Lattimore et al, 2015;Hao et al, 2020).…”

Section: Related Workmentioning

confidence: 99%

Near-optimal Representation Learning for Linear Bandits and Linear RL

Hu,

Chen,

Jin

et al. 2021

Preprint

View full text Add to dashboard Cite

This paper studies representation learning for multi-task linear bandits and multi-task episodic RL with linear value function approximation. We first consider the setting where we play M linear bandits with dimension d concurrently, and these bandits share a common k-dimensional linear representation so that k ≪ d and k ≪ M . We propose a sample-efficient algorithm, MTLR-OFUL, which leverages the shared representation to achieve Õ(M √ dkT + d √ kM T ) regret, with T being the number of total steps. Our regret significantly improves upon the baseline Õ(M d √ T ) achieved by solving each task independently. We further develop a lower bound that shows our regret is nearoptimal when d > M . Furthermore, we extend the algorithm and analysis to multi-task episodic RL with linear value function approximation under low inherent Bellman error (Zanette et al., 2020a). To the best of our knowledge, this is the first theoretical result that characterizes the benefits of multi-task representation learning for exploration in RL with function approximation.

show abstract

“…Their proposed algorithm shares some similarity as our algorithm for the infinite-action setting in that they added an exploration stage to extract the low-rank structure of Θ. Their setting is further generalized and studied by Lu et al [2020].…”

Section: Related Workmentioning

confidence: 99%

Impact of Representation Learning in Linear Bandits

Yang,

Hu,

Lee

et al. 2020

Preprint

View full text Add to dashboard Cite

We study how representation learning can improve the efficiency of bandit problems. We study the setting where we play T linear bandits with dimension d concurrently, and these T bandit tasks share a common k(≪ d) dimensional linear representation. For the finite-action setting, we present a new algorithm which achieves O(T √ kN + √ dkN T ) regret, where N is the number of rounds we play for each bandit. When T is sufficiently large, our algorithm significantly outperforms the naive algorithm (playing T bandits independently) that achieves O(T √ dN ) regret. We also provide an Ω(T √ kN + √ dkN T ) regret lower bound, showing that our algorithm is minimax-optimal up to poly-logarithmic factors. Furthermore, we extend our algorithm to the infinite-action setting and obtain a corresponding regret bound which demonstrates the benefit of representation learning in certain regimes. We also present experiments on synthetic and real-world data to illustrate our theoretical findings and demonstrate the effectiveness of our proposed algorithms.

show abstract

Low-Rank Generalized Linear Bandit Problems

Cited by 3 publications

References 22 publications

Nearly Minimax Algorithms for Linear Bandits with Shared Representation

Nearly Minimax Algorithms for Linear Bandits with Shared Representation

Near-optimal Representation Learning for Linear Bandits and Linear RL

Impact of Representation Learning in Linear Bandits

Contact Info

Product

Resources

About