Nearly Minimax Algorithms for Linear Bandits with Shared Representation

Yang, Jiaqi; Liu, Qi; Lee, Jason D.; Du, Simon S.

doi:10.48550/arxiv.2203.15664

Cited by 2 publications

(9 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Recently, an emerging number of works Yang et al (2021Yang et al ( , 2022; Hu et al (2021); Cella et al (2022b) investigate representation learning for sequential decision making, and show that if all tasks share a joint low-rank representation, then by leveraging such a joint representation, it is possible to learn faster than treating each task independently. Despite the accomplishments of these works, they mainly focus on the regret minimization setting, where the performance is measured by the cumulative reward gap between the optimal option and the actually chosen options.…”

Section: Introductionmentioning

confidence: 99%

“…Motivated by the above fact, in this paper, we study representation learning for multi-task pure exploration in sequential decision making. Following prior works Yang et al (2021Yang et al ( , 2022; Hu et al (2021), we consider the linear bandit setting, which is one of the most popular settings in sequential decision making and has various applications such as clinical trials and recommendation systems. Specifically, we investigate two pure exploration problems, i.e., representation learning for best arm identification in linear bandits (RepBAI-LB) and best policy identification in contextual linear bandits (RepBPI-CLB).…”

Section: Introductionmentioning

confidence: 99%

“…In contrast to existing representation learning works Yang et al (2021Yang et al ( , 2022; Hu et al (2021); Cella et al (2022b), we focus on the pure exploration scenario and face several unique challenges: (i) The sample complexity minimization objective requires us to plan an optimal sample allocation for recovering the low-rank representation, in order to save samples to the highest degree. (ii) Unlike prior works which either assume that the arm set is an ellipsoid/sphere Yang et al (2021Yang et al ( , 2022 or are computationally inefficient Hu et al (2021), we allow an arbitrary arm set that spans R d , which poses challenges on how to efficiently schedule samples according to the shapes of arms. (iii) Different from prior works Huang et al (2015); Li et al (2022), we do not assume prior knowledge of the context distribution.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Multi-task Representation Learning for Pure Exploration in Linear Bandits

Du¹,

Huang²,

Sun³

2023

Preprint

View full text Add to dashboard Cite

Despite the recent success of representation learning in sequential decision making, the study of the pure exploration scenario (i.e., identify the best option and minimize the sample complexity) is still limited. In this paper, we study multi-task representation learning for best arm identification in linear bandits (RepBAI-LB) and best policy identification in contextual linear bandits (RepBPI-CLB), two popular pure exploration settings with wide applications, e.g., clinical trials and web content optimization. In these two problems, all tasks share a common low-dimensional linear representation, and our goal is to leverage this feature to accelerate the best arm (policy) identification process for all tasks. For these problems, we design computationally and sample efficient algorithms DouExpDes and C-DouExpDes, which perform double experimental designs to plan optimal sample allocations for learning the global representation. We show that by learning the common representation among tasks, our sample complexity is significantly better than that of the native approach which solves tasks independently. To the best of our knowledge, this is the first work to demonstrate the benefits of representation learning for multi-task pure exploration.

show abstract

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Multi-task Representation Learning for Pure Exploration in Linear Bandits

Du¹,

Huang²,

Sun³

2023

Preprint

View full text Add to dashboard Cite

show abstract

“…While a lot of work has been done in the full information i.i.d. statistical setting [Baxter, 2000, Khodak et al, 2019, Maurer and Pontil, 2013, Maurer et al, 2016, Denevi et al, 2018, Tripuraneni et al, 2021, Boursier et al, 2022, investigation of meta-learning in the partial information interactive setting are lacking and we are only aware of few recent works [Cella et al, 2020, Yang et al, 2022. More work has been done on the multitask bandit setting [Kveton et al, 2021, Hu et al, 2021, Yang et al, 2020, Cella and Pontil, 2021, however the goal there is different, in that we wish to learn well a prescribed finite set of tasks as opposed to leverage knowledge from these to learn a novel downstream task.…”

Section: Introductionmentioning

confidence: 99%

“…In addition, it also works in poor data regimes, for any number of samples. In the bandits literature the closest works to our are [Cella et al, 2022] and [Yang et al, 2022].The former work investigates the multitask bandit setting, presenting a greedy policy based on trace-norm regularization and providing a regret bound on the set of training tasks. The latter work considers the lifelong learning (or meta-learning) framework as we do in this work and further analyzes the multi-task learning setting with infinite arms.…”

Section: Introductionmentioning

confidence: 99%

Meta Representation Learning with Contextual Linear Bandits

Cella¹,

Lounici²,

Pontil³

2022

Preprint

View full text Add to dashboard Cite

Meta-learning seeks to build algorithms that rapidly learn how to solve new learning problems based on previous experience. In this paper we investigate meta-learning in the setting of stochastic linear bandit tasks. We assume that the tasks share a low dimensional representation, which has been partially acquired from previous learning tasks. We aim to leverage this information in order to learn a new downstream bandit task, which shares the same representation. Our principal contribution is to show that if the learned representation estimates well the unknown one, then the downstream task can be efficiently learned by a greedy policy that we propose in this work. We derive an upper bound on the regret of this policy, which is, up to logarithmic factors, of order rwhere N is the horizon of the downstream task, T is the number of training tasks, d the ambient dimension and r d the dimension of the representation. We highlight that our strategy does not need to know r. We note that if T > d our bound achieves the same rate of optimal minimax bandit algorithms using the true underlying representation. Our analysis is inspired and builds in part upon previous work on meta-learning in the i.i.d. full information setting [Tripuraneni et al., 2021, Boursier et al., 2022. As a separate contribution we show how to relax certain assumptions in those works, thereby improving their representation learning and risk analysis.

show abstract

Nearly Minimax Algorithms for Linear Bandits with Shared Representation

Cited by 2 publications

References 25 publications

Multi-task Representation Learning for Pure Exploration in Linear Bandits

Multi-task Representation Learning for Pure Exploration in Linear Bandits

Meta Representation Learning with Contextual Linear Bandits

Contact Info

Product

Resources

About