2020
DOI: 10.1016/j.neucom.2019.12.132
|View full text |Cite
|
Sign up to set email alerts
|

Bayesian decomposition of multi-modal dynamical systems for reinforcement learning

Abstract: In this paper, we present a model-based reinforcement learning system where the transition model is treated in a Bayesian manner. The approach naturally lends itself to exploit expert knowledge by introducing priors to impose structure on the underlying learning task. The additional information introduced to the system means that we can learn from small amounts of data, recover an interpretable model and, importantly, provide predictions with an associated uncertainty. To show the benefits of the approach, we … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(3 citation statements)
references
References 8 publications
0
3
0
Order By: Relevance
“…with very good exploration. Other, more recent approaches may also address the offline setting, however their focus is often something else: (Hein et al, 2016(Hein et al, , 2018 focus on finding interpretable policies that are able to increase trust brought towards them by practitioners, while (Depeweg et al, 2016(Depeweg et al, , 2017Kaiser et al, 2020) put their emphasis on modeling the complicated uncertainties in the transition dynamics of the environments. While theoretically being offline, these algorithms also assume randomly collected datasets.…”
Section: Related Workmentioning
confidence: 99%
“…with very good exploration. Other, more recent approaches may also address the offline setting, however their focus is often something else: (Hein et al, 2016(Hein et al, , 2018 focus on finding interpretable policies that are able to increase trust brought towards them by practitioners, while (Depeweg et al, 2016(Depeweg et al, , 2017Kaiser et al, 2020) put their emphasis on modeling the complicated uncertainties in the transition dynamics of the environments. While theoretically being offline, these algorithms also assume randomly collected datasets.…”
Section: Related Workmentioning
confidence: 99%
“…Early works in this field, such as FQI and NFQ (Ernst et al, 2005;Riedmiller, 2005), termed the problem "batch" rather than offline, and didn't explicitly address the additional challenge that the batch mode brought to the table. Many other batch RL algorithms have since been proposed (Depeweg et al, 2016;Hein et al, 2018;Kaiser et al, 2020), which despite being offline in the sense that they do not interact with the environment, do not regularize their policy accordingly and instead assume a random data collection that makes generalization rather easy. Among the first to explicitly address the limitations in the offline setting were SPIBB(-DQN) (Laroche et al, 2019) in the discrete and BCQ (Fujimoto et al, 2019) in the continuous actions case.…”
Section: Related Workmentioning
confidence: 99%
“…The method may draw inferences and correct understanding bias towards the environment based on the posterior distribution of MDP parameters. In terms of environmental model assumptions, Kaiser et al (2020) enhanced the model's capacity to reflect the environment and improved learning efficiency by using a Gaussian procedure to generate an environmental dynamic prior Deisenroth and Rasmussen (2011). It combines expert knowledge with environmental priors and uses Bayesian inference for underlying learning tasks to create predictions about the relevant uncertainty, allowing for a flexible policy search.…”
Section: Model-based Reinforcement Learningmentioning
confidence: 99%