2018
DOI: 10.48550/arxiv.1804.07193
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Lipschitz Continuity in Model-based Reinforcement Learning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
17
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 11 publications
(17 citation statements)
references
References 0 publications
0
17
0
Order By: Relevance
“…They only assume that the optimal action-value function is Lipschitz continuous. This assumption is more general than that used in the aforementioned works as it is known that Lipschitz continuity of the reward function and the transition kernel leads to Lipschitz continuity of the optimal action-value function (Asadi et al, 2018). We use the same condition in this present paper.…”
Section: Related Workmentioning
confidence: 99%
“…They only assume that the optimal action-value function is Lipschitz continuous. This assumption is more general than that used in the aforementioned works as it is known that Lipschitz continuity of the reward function and the transition kernel leads to Lipschitz continuity of the optimal action-value function (Asadi et al, 2018). We use the same condition in this present paper.…”
Section: Related Workmentioning
confidence: 99%
“…Errors in the world model compound, and cause issues when used for control [3,63]. Amos et al [2], similar to our work, directly optimizes the dynamics model against loss by differentiating through a planning procedure, and Schmidhuber [52] proposes a similar idea of improving the internal model using an RNN, although the RNN world model is initially trained to perform forward prediction.…”
Section: Related Workmentioning
confidence: 94%
“…Our approach differs from it in three details: a) we use the absolute value of the value difference instead of the squared difference; b) we use the imaginary value function from the estimated dynamical model to define the loss, which makes the loss purely a function of the estimated model and the policy; c) we show that the iterative algorithm, using the loss function as a building block, can converge to a local maximum, partly by cause of the particular choices made in a) and b). Asadi et al (2018) also study the discrepancy bounds under Lipschitz condition of the MDP.…”
Section: Additional Related Workmentioning
confidence: 99%