2022
DOI: 10.48550/arxiv.2203.15664
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Nearly Minimax Algorithms for Linear Bandits with Shared Representation

Abstract: We give novel algorithms for multi-task and lifelong linear bandits with shared representation. Specifically, we consider the setting where we play M linear bandits with dimension d, each for T rounds, and these M bandit tasks share a common k( d) dimensional linear representation. For both the multi-task setting where we play the tasks concurrently, and the lifelong setting where we play tasks sequentially, we come up with novel algorithms that achieve O d √ kM T + kM √ T regret bounds, which matches the know… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
9
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(9 citation statements)
references
References 25 publications
0
9
0
Order By: Relevance
“…Recently, an emerging number of works Yang et al (2021Yang et al ( , 2022; Hu et al (2021); Cella et al (2022b) investigate representation learning for sequential decision making, and show that if all tasks share a joint low-rank representation, then by leveraging such a joint representation, it is possible to learn faster than treating each task independently. Despite the accomplishments of these works, they mainly focus on the regret minimization setting, where the performance is measured by the cumulative reward gap between the optimal option and the actually chosen options.…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations
“…Recently, an emerging number of works Yang et al (2021Yang et al ( , 2022; Hu et al (2021); Cella et al (2022b) investigate representation learning for sequential decision making, and show that if all tasks share a joint low-rank representation, then by leveraging such a joint representation, it is possible to learn faster than treating each task independently. Despite the accomplishments of these works, they mainly focus on the regret minimization setting, where the performance is measured by the cumulative reward gap between the optimal option and the actually chosen options.…”
Section: Introductionmentioning
confidence: 99%
“…Motivated by the above fact, in this paper, we study representation learning for multi-task pure exploration in sequential decision making. Following prior works Yang et al (2021Yang et al ( , 2022; Hu et al (2021), we consider the linear bandit setting, which is one of the most popular settings in sequential decision making and has various applications such as clinical trials and recommendation systems. Specifically, we investigate two pure exploration problems, i.e., representation learning for best arm identification in linear bandits (RepBAI-LB) and best policy identification in contextual linear bandits (RepBPI-CLB).…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…While a lot of work has been done in the full information i.i.d. statistical setting [Baxter, 2000, Khodak et al, 2019, Maurer and Pontil, 2013, Maurer et al, 2016, Denevi et al, 2018, Tripuraneni et al, 2021, Boursier et al, 2022, investigation of meta-learning in the partial information interactive setting are lacking and we are only aware of few recent works [Cella et al, 2020, Yang et al, 2022. More work has been done on the multitask bandit setting [Kveton et al, 2021, Hu et al, 2021, Yang et al, 2020, Cella and Pontil, 2021, however the goal there is different, in that we wish to learn well a prescribed finite set of tasks as opposed to leverage knowledge from these to learn a novel downstream task.…”
Section: Introductionmentioning
confidence: 99%
“…In addition, it also works in poor data regimes, for any number of samples. In the bandits literature the closest works to our are [Cella et al, 2022] and [Yang et al, 2022].The former work investigates the multitask bandit setting, presenting a greedy policy based on trace-norm regularization and providing a regret bound on the set of training tasks. The latter work considers the lifelong learning (or meta-learning) framework as we do in this work and further analyzes the multi-task learning setting with infinite arms.…”
Section: Introductionmentioning
confidence: 99%