2021
DOI: 10.48550/arxiv.2102.05406
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Non-stationary Reinforcement Learning without Prior Knowledge: An Optimal Black-box Approach

Chen-Yu Wei,
Haipeng Luo

Abstract: We propose a black-box reduction that turns a certain reinforcement learning algorithm with optimal regret in a (near-)stationary environment into another algorithm with optimal dynamic regret in a non-stationary environment, importantly without any prior knowledge on the degree of non-stationarity. By plugging different algorithms into our black-box, we provide a list of examples showing that our approach not only recovers recent results for (contextual) multi-armed bandits achieved by very specialized algori… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
10
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(10 citation statements)
references
References 6 publications
(14 reference statements)
0
10
0
Order By: Relevance
“…Non-stationary RL. Non-stationary RL has been mostly studied in the unconstrained setting (Jaksch et al, 2010;Auer et al, 2019;Ortner et al, 2020;Domingues et al, 2021;Mao et al, 2020;Zhou et al, 2020;Touati & Vincent, 2020;Fei et al, 2020;Zhong et al, 2021;Cheung et al, 2020;Wei & Luo, 2021). Our work is related to policy-based methods for non-stationary RL since the optimal solution of CMDP is usually a stochastic policy (Altman, 1999) and thus a policy-based method is preferred.…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations
“…Non-stationary RL. Non-stationary RL has been mostly studied in the unconstrained setting (Jaksch et al, 2010;Auer et al, 2019;Ortner et al, 2020;Domingues et al, 2021;Mao et al, 2020;Zhou et al, 2020;Touati & Vincent, 2020;Fei et al, 2020;Zhong et al, 2021;Cheung et al, 2020;Wei & Luo, 2021). Our work is related to policy-based methods for non-stationary RL since the optimal solution of CMDP is usually a stochastic policy (Altman, 1999) and thus a policy-based method is preferred.…”
Section: Related Workmentioning
confidence: 99%
“…To eliminate the assumption of having prior knowledge on variation budgets, Wei & Luo (2021) recently outline that an adaptive restart approach can be used to convert any upper-confidence-bound-type stationary RL algorithm to a dynamic-regret-minimizing algorithm. However, this approach is proposed only for the unconstrained problems and relies on the assumption of having an optimistic estimator of the optimal value function.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Second, the probabilities with which the exploration is carried out are different for the LQR problem owing to the quadratic cost. More recently, the authors in Wei and Luo [2021] outline that for many classes of episodic reinforcement learning problems, a similar strategy can be used to convert any Upper Confidence Bound (UCB) type stationary reinforcement learning algorithm to a dynamic regret minimizing algorithm. There are quite a few differences between Wei and Luo [2021] and our work: the LQR problem is not covered by the classes of MDPs they consider, we look at a non-episodic version of the LQR problem, and our algorithm is certainty equivalent controller-based and not a UCB-type.…”
Section: Introductionmentioning
confidence: 99%