2016
DOI: 10.1007/978-3-319-46227-1_30
|View full text |Cite
|
Sign up to set email alerts
|

Planning with Information-Processing Constraints and Model Uncertainty in Markov Decision Processes

Abstract: Abstract. Information-theoretic principles for learning and acting have been proposed to solve particular classes of Markov Decision Problems. Mathematically, such approaches are governed by a variational free energy principle and allow solving MDP planning problems with information-processing constraints expressed in terms of a Kullback-Leibler divergence with respect to a reference distribution. Here we consider a generalization of such MDP planners by taking model uncertainty into account. As model uncertai… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
25
0

Year Published

2017
2017
2025
2025

Publication Types

Select...
3
3
2

Relationship

5
3

Authors

Journals

citations
Cited by 20 publications
(25 citation statements)
references
References 23 publications
0
25
0
Order By: Relevance
“…Our approach belongs to a wider class of models that use information constraints for regularization to deal more efficiently with learning and decision-making problems [11], [31], [28], [25], [19], [35], [26], [18], [15], [3], [21], [38], [20], [17], [16]. One such prominent approach is Trust Region Policy Optimization (TRPO) [39].…”
Section: Discussionmentioning
confidence: 99%
“…Our approach belongs to a wider class of models that use information constraints for regularization to deal more efficiently with learning and decision-making problems [11], [31], [28], [25], [19], [35], [26], [18], [15], [3], [21], [38], [20], [17], [16]. One such prominent approach is Trust Region Policy Optimization (TRPO) [39].…”
Section: Discussionmentioning
confidence: 99%
“…Our analysis may have interesting implications, as it provides a normative framework for this kind of combined optimization of adaptive priors and decision-making processes. Prior to our work there have been several attempts to apply the framework of information-theoretic bounded rationality to machine learning tasks [7,11,12,18]. The novelty of our approach is that we design adaptive priors for both the single-step case and the multi-agent case and we demonstrate how to transform information-theoretic constraints into computational constraints in the form of MCMC steps.…”
Section: Discussionmentioning
confidence: 99%
“…In [ 33 ] and similarly in [ 20 ], the authors derive the relative entropy as a control cost from an information-theoretic point of view, under axioms of monotonicity and invariance under relabelling and decomposition. In other fields such as robotics, the relative entropy has also been used as a control cost [ 18 , 21 , 25 , 58 , 73 , 74 ] to regularize the behaviour of the controller by penalizing controls that are far from the uncontrolled dynamics of the system or to deal with model uncertainty [ 75 ]. Naturally, questions regarding the generality of entropic costs as information-processing costs and their potential relation to algorithmic space-time resource constraints carry over to the non-equilibrium scenario and remain a topic for future investigations.…”
Section: Discussionmentioning
confidence: 99%