2022
DOI: 10.48550/arxiv.2206.01162
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Posterior Coreset Construction with Kernelized Stein Discrepancy for Model-Based Reinforcement Learning

Abstract: Recent progress on improving theoretical sample efficiency of model-based reinforcement learning (RL), which exhibits superior sample complexity in practice, requires Gaussian and Lipschitz assumptions on the transition model, and additionally defines a posterior representation that grows unbounded with time. In this work, we propose a novel Kernelized Stein Discrepancy-based Posterior Sampling for RL algorithm (named KSRL) which extends model-based RL based upon posterior sampling (PSRL) in several ways: we (… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
10
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(10 citation statements)
references
References 16 publications
0
10
0
Order By: Relevance
“…We start by deriving an upper bound for E[K π k (M ; H k,H )] with an SPMCMC style local optimization method proposed originally in (Chen et al, 2019) and later used in sequential decision-making scenarios by (Chakraborty et al, 2022b;Hawkins et al, 2022). We start with the definition in (8) to write…”
Section: F Proof Of Lemma 43mentioning
confidence: 99%
“…We start by deriving an upper bound for E[K π k (M ; H k,H )] with an SPMCMC style local optimization method proposed originally in (Chen et al, 2019) and later used in sequential decision-making scenarios by (Chakraborty et al, 2022b;Hawkins et al, 2022). We start with the definition in (8) to write…”
Section: F Proof Of Lemma 43mentioning
confidence: 99%
“…KSD Thinning: We develop a principled way to avoid the requirement that the dictionary D k retains all information from past episodes, and is instead parameterized by a coreset of statistically significant samples. More specifically, observe that in step 10 and 11 in PSRL (see Algorithm in Appendix (Chakraborty et al 2022)), the dictionary at each episode k retains H additional points, i.e., |D k+1 | = |D k | + H. Hence, as the number of episodes experienced becomes large, the posterior representational complexity grows linearly and unbounded with episode index k. On top of that, the posterior update in step 11 in PSRL (cf. Algorithm ??)…”
Section: Posterior Coreset Construction Via Ksdmentioning
confidence: 99%
“…We summarize the proposed algorithm in Algorithm 1 with compression subroutine in Algorithm 2, where KSRL is an abbreviation for Kernelized Stein Discrepancy Thinning for Model-Based Reinforcement Learning. Please refer to discussion Appendix (Chakraborty et al 2022) for MPC-based action selection.…”
Section: Posterior Coreset Construction Via Ksdmentioning
confidence: 99%
See 2 more Smart Citations