A Rollout Algorithm for Multichain Markov Decision Processes with Average Cost

Sun, Tao; Zhao, Qianchuan; Luh, Peter B.

doi:10.1007/978-3-642-02894-6_15

Cited by 3 publications

(3 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, the ideas they propose can be incorporated to guide initial exploration of actions in approaches like References [1,52]. • Relaxing certain theoretical assumptions like non-communicating MDPs [23], multi-chain MDPs [67], and so on, can further improve the applicability of regret-based approaches in control-based approaches. • Most of the model-based and model-free approaches in Section 5 are not scalable to large problem sizes.…”

Section: Future Directionsmentioning

confidence: 99%

A Survey of Reinforcement Learning Algorithms for Dynamically Varying Environments

Padakandla

2021

ACM Comput. Surv.

View full text Add to dashboard Cite

Reinforcement learning (RL) algorithms find applications in inventory control, recommender systems, vehicular traffic management, cloud computing, and robotics. The real-world complications arising in these domains makes them difficult to solve with the basic assumptions underlying classical RL algorithms. RL agents in these applications often need to react and adapt to changing operating conditions. A significant part of research on single-agent RL techniques focuses on developing algorithms when the underlying assumption of stationary environment model is relaxed. This article provides a survey of RL methods developed for handling dynamically varying environment models. The goal of methods not limited by the stationarity assumption is to help autonomous agents adapt to varying operating conditions. This is possible either by minimizing the rewards lost during learning by RL agent or by finding a suitable policy for the RL agent that leads to efficient operation of the underlying system. A representative collection of these algorithms is discussed in detail in this work along with their categorization and their relative merits and demerits. Additionally, we also review works that are tailored to application domains. Finally, we discuss future enhancements for this field.

show abstract

Section: Future Directionsmentioning

confidence: 99%

A Survey of Reinforcement Learning Algorithms for Dynamically Varying Environments

Padakandla

2021

ACM Comput. Surv.

View full text Add to dashboard Cite

show abstract

“…However, the ideas they propose can be incorporated to guide initial exploration of actions in approaches like [29], [30]. • Relaxing certain theoretical assumptions like noncommunicating MDPs [72], multi-chain MDPs [73] etc can further improve the applicability of regret-based approaches in control-based approaches.…”

Section: Future Directionsmentioning

confidence: 99%

A Survey of Reinforcement Learning Algorithms for Dynamically Varying Environments

Padakandla

2020

Preprint

View full text Add to dashboard Cite

Reinforcement learning (RL) algorithms find applications in inventory control, recommender systems, vehicular traffic management, cloud computing and robotics. The realworld complications of many tasks arising in these domains makes them difficult to solve with the basic assumptions underlying classical RL algorithms. RL agents in these applications often need to react and adapt to changing operating conditions. A significant part of research on single-agent RL techniques focuses on developing algorithms when the underlying assumption of stationary environment model is relaxed. This paper provides a survey of RL methods developed for handling dynamically varying environment models. The goal of methods not limited by the stationarity assumption is to help autonomous agents adapt to varying operating conditions. This is possible either by minimizing the rewards lost during learning by RL agent or by finding a suitable policy for the RL agent which leads to efficient operation of the underlying system. A representative collection of these algorithms is discussed in detail in this work along with their categorization and their relative merits and demerits. Additionally we also review works which are tailored to application domains. Finally, we discuss future enhancements for this field.

show abstract

“…Therefore, any prospective methodology must incorporate such a limitation in its solution process. We incorporate the Optimal Computing Budget Allocation (OCBA) algorithm into our MDP solution process [2], [3] to address the limited simulation budget problem.…”

Section: Introductionmentioning

confidence: 99%

Solving Markov decision processes for network-level post-hazard recovery via simulation optimization and rollout

Sarkale¹,

Nozhati²,

Chong³

et al. 2018

2018 IEEE 14th International Conference on Automation Science and Engineering (CASE)

View full text Add to dashboard Cite

Computation of optimal recovery decisions for community resilience assurance post-hazard is a combinatorial decision-making problem under uncertainty. It involves solving a large-scale optimization problem, which is significantly aggravated by the introduction of uncertainty. In this paper, we draw upon established tools from multiple research communities to provide an effective solution to this challenging problem. We provide a stochastic model of damage to the water network (WN) within a testbed community following a severe earthquake and compute near-optimal recovery actions for restoration of the water network. We formulate this stochastic decisionmaking problem as a Markov Decision Process (MDP), and solve it using a popular class of heuristic algorithms known as rollout. A simulation-based representation of MDPs is utilized in conjunction with rollout and the Optimal Computing Budget Allocation (OCBA) algorithm to address the resulting stochastic simulation optimization problem. Our method employs nonmyopic planning with efficient use of simulation budget. We show, through simulation results, that rollout fused with OCBA performs competitively with respect to rollout with total equal allocation (TEA) at a meagre simulation budget of 5-10% of rollout with TEA, which is a crucial step towards addressing large-scale community recovery problems following natural disasters.Saeed.Nozhati, Bruce.Ellingwood,

show abstract

A Rollout Algorithm for Multichain Markov Decision Processes with Average Cost

Cited by 3 publications

References 14 publications

A Survey of Reinforcement Learning Algorithms for Dynamically Varying Environments

A Survey of Reinforcement Learning Algorithms for Dynamically Varying Environments

A Survey of Reinforcement Learning Algorithms for Dynamically Varying Environments

Solving Markov decision processes for network-level post-hazard recovery via simulation optimization and rollout

Contact Info

Product

Resources

About