2021
DOI: 10.1613/jair.1.12611
|View full text |Cite
|
Sign up to set email alerts
|

Steady-State Planning in Expected Reward Multichain MDPs

Abstract: The planning domain has experienced increased interest in the formal synthesis of decision-making policies. This formal synthesis typically entails finding a policy which satisfies formal specifications in the form of some well-defined logic. While many such logics have been proposed with varying degrees of expressiveness and complexity in their capacity to capture desirable agent behavior, their value is limited when deriving decision-making policies which satisfy certain types of asymptotic behavior in gener… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(2 citation statements)
references
References 37 publications
0
2
0
Order By: Relevance
“…Compared to the discounted reward, the average-reward depends on the limiting behavior of the underlying stochastic process and is markedly more intricate. A recognized instance of such intricacy concerns the one-to-one correspondence between the stationary policies and the limit points of state-action frequencies, which while true for discounted MDPs, breaks down under the average-reward criterion even in the non-robust setting except in some very special cases (Puterman, 1994;Atia et al, 2021). This is largely due to the dependence on the necessary conditions for establishing a contraction in average-reward settings on the graph structure of the MDP, versus the discounted-reward setting where it simply suffices to have a discount factor that is strictly less than one (Kazemi, Perez, Somenzi, Soudjani, Trivedi, & Velasquez, 2022).…”
Section: Introductionmentioning
confidence: 99%
“…Compared to the discounted reward, the average-reward depends on the limiting behavior of the underlying stochastic process and is markedly more intricate. A recognized instance of such intricacy concerns the one-to-one correspondence between the stationary policies and the limit points of state-action frequencies, which while true for discounted MDPs, breaks down under the average-reward criterion even in the non-robust setting except in some very special cases (Puterman, 1994;Atia et al, 2021). This is largely due to the dependence on the necessary conditions for establishing a contraction in average-reward settings on the graph structure of the MDP, versus the discounted-reward setting where it simply suffices to have a discount factor that is strictly less than one (Kazemi, Perez, Somenzi, Soudjani, Trivedi, & Velasquez, 2022).…”
Section: Introductionmentioning
confidence: 99%
“…Compared to the discounted setting, the average-reward setting depends on the limiting behavior of the underlying stochastic process, and hence is markedly more intricate. A recognized instance of such intricacy concerns the one-to-one correspondence between the stationary policies and the limit points of state-action frequencies, which while true for discounted MDPs, breaks down under the average-reward criterion even in the nonrobust setting except in some very special cases (Puterman 1994;Atia et al 2021). This is largely due to dependence of the necessary conditions for establishing a contraction in average-reward settings on the graph structure of the MDP, versus the discounted-reward setting where it simply suffices to have a discount factor that is strictly less than one.…”
Section: Introductionmentioning
confidence: 99%