Constrained Markov decision processes with total cost criteria: Lagrangian approach and dual linear program

Altman, Eitan

doi:10.1007/s001860050035

Cited by 54 publications

(49 citation statements)

References 53 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Altman [90,91] 研究了 DTMDP 受约束的总费用准则; 文献 [92,93] 对 DTMDP 受约束的平均准则进行了讨论; 文献 [20,94] 处理了 CTMDP 受约束的折扣准则; Guo 等人 [21] 考虑了 CTMDP 受约束的平均准则.…”

Section: 受约问题unclassified

A survey on semi-Markov decision processes

Guo¹,

Huang²

2015

Sci. Sin.-Math.

View full text Add to dashboard Cite

Buffet [15] 总结了 MDP 在人工智能方面的应用; Mahadevan [16] 分析了 DTMDP 的动态系统学习控制理论、算法和应用等; Simar 和 Parsons [17] 对 DTMDP 和 BDI (belief-desire-intention) 两种模型及其关系进行了统一分析. 关于 CTMDP 的研究也取得系列进展: 连续时间 MDP 的研究源于 Howard [2] 的开创性工作. 之后, Gihman 和 Skorohod [4] 考虑了一般状态有限阶段模型及转移率有界的可数状态折扣模型;

show abstract

Section: 受约问题unclassified

A survey on semi-Markov decision processes

Guo¹,

Huang²

2015

Sci. Sin.-Math.

View full text Add to dashboard Cite

show abstract

“…For that reason, a number of researchers have proposed and utilized an alternative solution approach, which is based upon mathematical programming (Altman, 1998;Feinberg, 2000;Dolgov & Durfee, 2006). A procedure for formulating an MDP into a linear program (whose solution yields an optimal policy maximizing the total expected reward) is described below.…”

Section: Linear Programmingmentioning

confidence: 99%

Resource-Driven Mission-Phasing Techniques for Constrained Agents in Stochastic Environments

Wu¹,

Durfee²

2010

jair

View full text Add to dashboard Cite

Because an agent's resources dictate what actions it can possibly take, it should plan which resources it holds over time carefully, considering its inherent limitations (such as power or payload restrictions), the competing needs of other agents for the same resources, and the stochastic nature of the environment. Such agents can, in general, achieve more of their objectives if they can use -and even create -opportunities to change which resources they hold at various times. Driven by resource constraints, the agents could break their overall missions into an optimal series of phases, optimally reconfiguring their resources at each phase, and optimally using their assigned resources in each phase, given their knowledge of the stochastic environment.In this paper, we formally define and analyze this constrained, sequential optimization problem in both the single-agent and multi-agent contexts. We present a family of mixed integer linear programming (MILP) formulations of this problem that can optimally create phases (when phases are not predefined) accounting for costs and limitations in phase creation. Because our formulations simultaneously also find the optimal allocations of resources at each phase and the optimal policies for using the allocated resources at each phase, they exploit structure across these coupled problems. This allows them to find solutions significantly faster (orders of magnitude faster in larger problems) than alternative solution techniques, as we demonstrate empirically.

show abstract

“…Our third contribution lies in deriving key theoretical results establishing provable performance and behavior guarantees for the derived policies. Contracting or transient MDP models that use the expected total reward as the optimality criterion are commonplace in constrained MDPs since optimal stationary policies with regard to this criterion can always be found via mathematical programming in view of a well-established one-to-one correspondence between stationary policies and feasible solutions to such programs (Altman, 1998;Feinberg, 2000;Wu & Durfee, 2010;Petrik & Zilberstein, 2009). The notoriously more difficult and equally important expected average reward criterion is much less understood considering that such correspondence ceases to exist for general multichain MDPs.…”

Section: Introductionmentioning

confidence: 99%

Steady-State Planning in Expected Reward Multichain MDPs

Atia¹,

Beckus²,

Alkhouri³

et al. 2021

jair

View full text Add to dashboard Cite

The planning domain has experienced increased interest in the formal synthesis of decision-making policies. This formal synthesis typically entails finding a policy which satisfies formal specifications in the form of some well-defined logic. While many such logics have been proposed with varying degrees of expressiveness and complexity in their capacity to capture desirable agent behavior, their value is limited when deriving decision-making policies which satisfy certain types of asymptotic behavior in general system models. In particular, we are interested in specifying constraints on the steady-state behavior of an agent, which captures the proportion of time an agent spends in each state as it interacts for an indefinite period of time with its environment. This is sometimes called the average or expected behavior of the agent and the associated planning problem is faced with significant challenges unless strong restrictions are imposed on the underlying model in terms of the connectivity of its graph structure. In this paper, we explore this steady-state planning problem that consists of deriving a decision-making policy for an agent such that constraints on its steady-state behavior are satisfied. A linear programming solution for the general case of multichain Markov Decision Processes (MDPs) is proposed and we prove that optimal solutions to the proposed programs yield stationary policies with rigorous guarantees of behavior.

show abstract

Constrained Markov decision processes with total cost criteria: Lagrangian approach and dual linear program

Cited by 54 publications

References 53 publications

A survey on semi-Markov decision processes

A survey on semi-Markov decision processes

Resource-Driven Mission-Phasing Techniques for Constrained Agents in Stochastic Environments

Steady-State Planning in Expected Reward Multichain MDPs

Contact Info

Product

Resources

About