2019
DOI: 10.48550/arxiv.1905.03030
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Meta-learning of Sequential Strategies

Abstract: In this report we review memory-based metalearning as a tool for building sample-efficient strategies that learn from past experience to adapt to any task within a target class. Our goal is to equip the reader with the conceptual foundations of this tool for building new, scalable agents that operate on broad domains. To do so, we present basic algorithmic templates for building near-optimal predictors and reinforcement learners which behave as if they had a probabilistic model that allowed them to efficiently… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
9
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
4
1

Relationship

3
6

Authors

Journals

citations
Cited by 10 publications
(10 citation statements)
references
References 20 publications
1
9
0
Order By: Relevance
“…One study byKorovina et al, who propose a method described in the next paragraph, highlight several existing methods that all require ≥ 5 thousand evaluations for a single task compared to their 100. Based on the machine learning community's broader interest in improving the sample efficiency of reinforcement learning algorithms34…”
mentioning
confidence: 99%
“…One study byKorovina et al, who propose a method described in the next paragraph, highlight several existing methods that all require ≥ 5 thousand evaluations for a single task compared to their 100. Based on the machine learning community's broader interest in improving the sample efficiency of reinforcement learning algorithms34…”
mentioning
confidence: 99%
“…Perhaps surprisingly, perplexity-based meta-learning of historydependent LLMs is closely related to the explicit Bayesian mixture solution described in Equation 4. In particular, one can show that in many standard meta-learning setups, the optimal perplexity-minimizing solution is exactly a Bayesian mixture distribution (Ortega et al 2019). Provided that a sufficiently powerful history-dependent model is used (such as the case with LLMs based on Transformers) to model the interaction histories, a low-perplexity solution can be seen as a learnt approximation to the explicit Bayesian construction we provided in in Equation 4.…”
Section: Discussionmentioning
confidence: 99%
“…Both the fixed payoff and the mean 𝜇 of the risky arm were drawn from a standard Gaussian distribution at the beginning of an episode, which lasted twenty rounds. To build agents that can trade off exploration versus exploitation, we used memory-based meta-learning [Santoro et al, 2016, Wang et al, 2016, which is known to produce near-optimal bandit players [Mikulik et al, 2020, Ortega et al, 2019.…”
Section: Banditsmentioning
confidence: 99%