Non-Stationary Representation Learning in Sequential Linear Bandits

Qin, Yuzhen; Menara, Tommaso; Oymak, Samet; Ching, ShiNung; Pasqualetti, Fabio

doi:10.1109/ojcsys.2022.3178540

Cited by 9 publications

(7 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, a key distinction is that, in ICL, adaptation to a new task happens implicitly through input prompt. Our analysis has some parallels with recent literature on multitask representation learning [32,13,51,8,26,21,42,50,10,34,14,58] since we develop excess MTL risk bounds by training the model with 𝑇 tasks and quantify these bounds in terms of complexity of the hypothesis space (i.e. transformer architecture), the number of tasks 𝑇, and the number of samples per task.…”

Section: F Further Related Work On Multitask/meta Learningmentioning

confidence: 85%

See 1 more Smart Citation

Transformers as Algorithms: Generalization and Stability in In-context Learning

Li¹,

Ildiz²,

Papailiopoulos³

et al. 2023

Preprint

View full text Add to dashboard Cite

In-context learning (ICL) is a type of prompting where a transformer model operates on a sequence of (input, output) examples and performs inference on-the-fly. In this work, we formalize in-context learning as an algorithm learning problem where a transformer model implicitly constructs a hypothesis function at inference-time. We first explore the statistical aspects of this abstraction through the lens of multitask learning: We obtain generalization bounds for ICL when the input prompt is (1) a sequence of i.i.d. (input, label) pairs or (2) a trajectory arising from a dynamical system. The crux of our analysis is relating the excess risk to the stability of the algorithm implemented by the transformer. We characterize when transformer/attention architecture provably obeys the stability condition and also provide empirical verification. For generalization on unseen tasks, we identify an inductive bias phenomenon in which the transfer learning risk is governed by the task complexity and the number of MTL tasks in a highly predictable manner. Finally, we provide numerical evaluations that (1) demonstrate transformers can indeed implement near-optimal algorithms on classical regression problems with i.i.d. and dynamic data, (2) provide insights on stability, and (3) verify our theoretical predictions. Introduction Desired Output In-context learning Input promptNatural language processing berry, baya, apple, manzana, banana

show abstract

Section: F Further Related Work On Multitask/meta Learningmentioning

confidence: 85%

“…Here the first term in (42) comes from the fact that loss function is bounded by 𝐵, and we assume S (0) = ∅, and the second term follows the Hypothesis 1. Next, we turn to bound risk(ℎ, 𝑚).…”

Section: E Model Selection and Approximation Error Analysismentioning

confidence: 99%

Transformers as Algorithms: Generalization and Stability in In-context Learning

Li¹,

Ildiz²,

Papailiopoulos³

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

“…Cella et al (2022a,b) also investigate the problem in Yang et al (2021) and propose algorithms which do not need to know the dimension of the underlying representation. Qin et al (2022) study representation learning for linear bandit under non-stationary environments, and develop algorithms that learn and transfer non-stationary representations adaptively. Different from the above works which consider regret minimization, we study representation learning for (contextual) linear bandit with the pure exploration objective, which imposes unique challenges in how to optimally allocate samples to learn the feature extractor, and motivates us to design algorithms building upon double experimental designs.…”

Section: Appendix a Related Workmentioning

confidence: 99%

Multi-task Representation Learning for Pure Exploration in Linear Bandits

Du¹,

Huang²,

Sun³

2023

Preprint

View full text Add to dashboard Cite

Despite the recent success of representation learning in sequential decision making, the study of the pure exploration scenario (i.e., identify the best option and minimize the sample complexity) is still limited. In this paper, we study multi-task representation learning for best arm identification in linear bandits (RepBAI-LB) and best policy identification in contextual linear bandits (RepBPI-CLB), two popular pure exploration settings with wide applications, e.g., clinical trials and web content optimization. In these two problems, all tasks share a common low-dimensional linear representation, and our goal is to leverage this feature to accelerate the best arm (policy) identification process for all tasks. For these problems, we design computationally and sample efficient algorithms DouExpDes and C-DouExpDes, which perform double experimental designs to plan optimal sample allocations for learning the global representation. We show that by learning the common representation among tasks, our sample complexity is significantly better than that of the native approach which solves tasks independently. To the best of our knowledge, this is the first work to demonstrate the benefits of representation learning for multi-task pure exploration.

show abstract

“…Multitask bandits, multitask RL and meta-RL: The benefit of multitask learning in linear bandits has been investigated in Yang et al (2021); Qin et al (2022); Cella et al (2022); Azizi et al (2022); Deshmukh et al (2017); Cella and Pontil (2021); Hu et al (2021). For multitask RL, Arora et al (2020) showed that representation learning can reduce sample complexity for imitation learning.…”

Section: Related Workmentioning

confidence: 99%

“…In downstream, with the help of the learned representation, the agent aims to find a near-optimal policy of a new task that shares the same representation as the source tasks. While representation learning has achieved great success in supervised learning (Du et al, 2020;Tripuraneni et al, 2021;Maurer et al, 2016;Kong et al, 2020) and multi-armed bandits (MAB) problems (Yang et al, 2021;Qin et al, 2022;Cella et al, 2022), most works in multitask RL mainly focus on empirical algorithms (Sodhani et al, 2021;Arulkumaran et al, 2022;Teh et al, 2017) with limited theoretical works (Arora et al, 2020;Hu et al, 2021;Brunskill and Li, 2013;Müller and Pacchiano, 2022;Calandriello et al, 2015;Lu et al, 2021;D'Eramo et al, 2020). Generally speaking, there are two main challenges for multitask representation learning in RL.…”

Section: Introductionmentioning

confidence: 99%

Provable Benefit of Multitask Representation Learning in Reinforcement Learning

Cheng¹,

Feng²,

Yang³

et al. 2022

Preprint

View full text Add to dashboard Cite

As representation learning becomes a powerful technique to reduce sample complexity in reinforcement learning (RL) in practice, theoretical understanding of its advantage is still limited. In this paper, we theoretically characterize the benefit of representation learning under the low-rank Markov decision process (MDP) model. We first study multitask low-rank RL (as upstream training), where all tasks share a common representation, and propose a new multitask reward-free algorithm called REFUEL. REFUEL learns both the transition kernel and the near-optimal policy for each task, and outputs a well-learned representation for downstream tasks. Our result demonstrates that multitask representation learning is provably more sample-efficient than learning each task individually, as long as the total number of tasks is above a certain threshold. We then study the downstream RL in both online and offline settings, where the agent is assigned with a new task sharing the same representation as the upstream tasks. For both online and offline settings, we develop a sample-efficient algorithm, and show that it finds a near-optimal policy with the suboptimality gap bounded by the sum of the estimation error of the learned representation in upstream and a vanishing term as the number of downstream samples becomes large. Our downstream results of online and offline RL further capture the benefit of employing the learned representation from upstream as opposed to learning the representation of the lowrank model directly. To the best of our knowledge, this is the first theoretical study that characterizes the benefit of representation learning in exploration-based reward-free multitask RL for both upstream and downstream tasks.

show abstract

Non-Stationary Representation Learning in Sequential Linear Bandits

Cited by 9 publications

References 28 publications

Transformers as Algorithms: Generalization and Stability in In-context Learning

Transformers as Algorithms: Generalization and Stability in In-context Learning

Multi-task Representation Learning for Pure Exploration in Linear Bandits

Provable Benefit of Multitask Representation Learning in Reinforcement Learning

Contact Info

Product

Resources

About