2019
DOI: 10.48550/arxiv.1903.01599
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Learning Dynamics Model in Reinforcement Learning by Incorporating the Long Term Future

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
12
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
7
2
1

Relationship

1
9

Authors

Journals

citations
Cited by 11 publications
(12 citation statements)
references
References 0 publications
0
12
0
Order By: Relevance
“…One challenge in these heuristics is that they may be unstable and difficult to fix or improve when they do not work in new environments. These heuristics can manifest in the form of learning a latent space that is locally linear, e.g., in Embed to Control and related methods (Watter et al, 2015), by enforcing that the model makes long-horizon predictions (Ke et al, 2019), ignoring uncontrollable parts of the state space (Ghosh et al, 2018), detecting and correcting when a predictive model steps off the manifold of reasonable states (Talvitie, 2017), adding reward signal prediction on top of the latent space Gelada et al (2019), or adding noise when training transitions Mankowitz et al (2019).…”
Section: Discussion Related Work and Future Workmentioning
confidence: 99%
“…One challenge in these heuristics is that they may be unstable and difficult to fix or improve when they do not work in new environments. These heuristics can manifest in the form of learning a latent space that is locally linear, e.g., in Embed to Control and related methods (Watter et al, 2015), by enforcing that the model makes long-horizon predictions (Ke et al, 2019), ignoring uncontrollable parts of the state space (Ghosh et al, 2018), detecting and correcting when a predictive model steps off the manifold of reasonable states (Talvitie, 2017), adding reward signal prediction on top of the latent space Gelada et al (2019), or adding noise when training transitions Mankowitz et al (2019).…”
Section: Discussion Related Work and Future Workmentioning
confidence: 99%
“…For each trajectory in the retrieval batch, we represent each time-step within a trajectory by a set of two vectors h i,t and b i,t (Figure 6 in the appendix) where h i,t summarizes the past (i.e., from t = 0 to t = t time-steps of the i th trajectory) while b i,t summarizes the future (i.e., from t = t to t = time-steps) within the i th trajectory. In addition, taking inspiration from (Jaderberg et al, 2016;Trinh et al, 2018;Ke et al, 2019;Devlin et al, 2018;Mazoure et al, 2020), we use auxiliary losses to improve modeling of long term dependencies when training the parameters of our forward and backward summarizers. The goal of these losses is to force the representation (h i,t , b i,t ) i,t≥0 to capture meaningful information for the unknown downstream task.…”
Section: Retrieval Batch Sampling and Pre-processingmentioning
confidence: 99%
“…Other related methods have attempted to create direct, structural links between models and trajectory training data. With high dimensional images (rather than states) [23] uses an auto-regressive, recurrent network to predict observations in a latent state-space. [24] proposes a multi-step Gaussian Process for learning robotic control and the approach is studied further using the correlation between prediction steps in [25].…”
Section: Predicting Trajectoriesmentioning
confidence: 99%