2021
DOI: 10.48550/arxiv.2106.02390
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Online reinforcement learning with sparse rewards through an active inference capsule

Abstract: Intelligent agents must pursue their goals in complex environments with partial information and often limited computational capacity. Reinforcement learning methods have achieved great success by creating agents that optimize engineered reward functions, but which often struggle to learn in sparse-reward environments, generally require many environmental interactions to perform well, and are typically computationally very expensive. Active inference is a model-based approach that directs agents to explore unce… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
2

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(3 citation statements)
references
References 18 publications
0
3
0
Order By: Relevance
“…Interestingly, dAIF also permits the alternative architecture described in Fig. 4 right (also proposed in [20]), in which the network learns to predict future observations. Here the transition network is part of the autoencoding.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Interestingly, dAIF also permits the alternative architecture described in Fig. 4 right (also proposed in [20]), in which the network learns to predict future observations. Here the transition network is part of the autoencoding.…”
Section: Discussionmentioning
confidence: 99%
“…In particular, 1-step ahead action formulation in conjunction with bootstrapping might not capture a proper structure of the world, which is needed to complete the task, even if we use several consecutive input images to compute the state. N-step ahead observation optimization EFE formulations, as proposed in [5,28,20], may aid learning. Particularly, when sub-stituting the negative log surprise by the rewards, the agent might loose the exploratory AIF characteristic, thus focusing only on goal-oriented behaviour.…”
Section: Dependency Of Input Spacementioning
confidence: 99%
“…New exciting developments and studies of Deep AIF agents are being under research for more complex environments with partial observability and high-dimensional inputs and actions [35], [34], [79], [98], [99].…”
Section: Planningmentioning
confidence: 99%