2011
DOI: 10.1007/978-3-642-22887-2_5
|View full text |Cite
|
Sign up to set email alerts
|

Planning to Be Surprised: Optimal Bayesian Exploration in Dynamic Environments

Abstract: Abstract. To maximize its success, an AGI typically needs to explore its initially unknown world. Is there an optimal way of doing so? Here we derive an affirmative answer for a broad class of environments.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

3
110
0

Year Published

2011
2011
2023
2023

Publication Types

Select...
3
2
2
2

Relationship

0
9

Authors

Journals

citations
Cited by 108 publications
(113 citation statements)
references
References 7 publications
3
110
0
Order By: Relevance
“…As a generalization of exploration methods in reinforcement learning, such as [18], ideas have been suggested such as planning to be surprised [56] or the combination of empirical learning progress with visit counts [39].…”
Section: Planning Topicsmentioning
confidence: 99%
“…As a generalization of exploration methods in reinforcement learning, such as [18], ideas have been suggested such as planning to be surprised [56] or the combination of empirical learning progress with visit counts [39].…”
Section: Planning Topicsmentioning
confidence: 99%
“…In the context of online learning, one way to avoid bad-bootstraps is to select actions based on (expected) epistemic value (Schwartenbeck et al, 2018;Friston et al, 2017;Sun et al, 2011), where agents seek out novel interactions based on counterfactually informed beliefs about which actions will lead to informative transitions. By utilising the uncertainty encoded by (beliefs about) model parameters, this approach can proactively identify optimally informative transitions.…”
Section: Learning Action-oriented Models: Good and Bad Bootstrapsmentioning
confidence: 99%
“…However, random exploration of this sort is likely to be inefficient in rich and complex environments. In such environments, a more powerful method is to utilize the uncertainty quantified by probabilistic models to determine epistemic (or intrinsic, information-seeking, uncertainty reducing) actions that attempt to minimize the model uncertainty in a directed manner (Stadie et al, 2015;Houthooft et al, 2016;Sun et al, 2011;Friston et al, 2015;Burda et al, 2018;Friston et al, 2017). While epistemic actions can help avoid bad-bootstraps and sub-optimal convergence, such actions necessarily increase the diversity and dimensionality of sampled data, thus sacrificing the benefits afforded by learning in the presence of goal-directed actions.…”
Section: Introductionmentioning
confidence: 99%
“…This is why expected Bayesian surprise has to be maximised when selecting actions, where it 830 plays the role of epistemic affordance (Parr and Friston 2017). As noted above, this is an important 831 imperative that underwrites uncertainty reducing, exploratory behaviour; known as intrinsic motivation in 832 neurorobotics (Schmidhuber 2006) or salience when 'planning to be surprised' (Sun, Gomez et al 2011, 833 Barto, Mirolli et al 2013). An intuitive way of thinking about whether surprise should be maximised or 834 minimised is to appeal to the analogy of scientific experiment.…”
Section: Simulations 673mentioning
confidence: 99%