2018 56th Annual Allerton Conference on Communication, Control, and Computing (Allerton) 2018
DOI: 10.1109/allerton.2018.8636066
|View full text |Cite
|
Sign up to set email alerts
|

Entropy Maximization for Constrained Markov Decision Processes

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
16
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
5
1

Relationship

2
4

Authors

Journals

citations
Cited by 13 publications
(16 citation statements)
references
References 9 publications
0
16
0
Order By: Relevance
“…Well posedness: For the class of proper policy µ ∈ Γ the maximum entropy H µ (s) ∀ s ∈ S is finite as shown in [40], [41]. In short, the existence of a cost-free termination state δ and a non-zero probability to reach it from any state s ∈ S ensures that the maximum entropy is finite.…”
Section: A Problem Formulationmentioning
confidence: 99%
See 1 more Smart Citation
“…Well posedness: For the class of proper policy µ ∈ Γ the maximum entropy H µ (s) ∀ s ∈ S is finite as shown in [40], [41]. In short, the existence of a cost-free termination state δ and a non-zero probability to reach it from any state s ∈ S ensures that the maximum entropy is finite.…”
Section: A Problem Formulationmentioning
confidence: 99%
“…In short, the existence of a cost-free termination state δ and a non-zero probability to reach it from any state s ∈ S ensures that the maximum entropy is finite. Please refer to the Theorem 1 in [40] or Proposition 2 in [41] for further details. Remark 1.…”
Section: A Problem Formulationmentioning
confidence: 99%
“…However, as explained in Section IV-C and shown in the following examples, a policy that induces a stochastic process with an arbitrarily large entropy can easily be obtained by introducing constraints on the expected residence time in certain states. Additional motion planning examples are provided in [17].…”
Section: Examplesmentioning
confidence: 99%
“…A preliminary version[17] of this paper relied on Proposition 36 from[3]. This proposition is not valid in general.…”
mentioning
confidence: 99%
“…The entropy maximization for MDPs is discussed in [17], which can be considered as a special case of the synthesis of the optimal deceptive policy where the reference policy follows every possible path with equal probability. For the synthesis of optimal deceptive policies, we use a method similar to [17] in that we represent a path as a collection of transitions between the states. We explore the synthesis of optimal reference policies, which, to the best of our knowledge, has not been discussed before.…”
Section: Introductionmentioning
confidence: 99%