Toward Generalization of Automated Temporal Abstraction to Partially Observable Reinforcement Learning

Çilden, Erkin; Polat, Faruk

doi:10.1109/tcyb.2014.2352038

Cited by 10 publications

(2 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, the resulting problem is not necessarily an MDP as every transition from one state to another is dependent on the path (and the parameter values) taken till the current state. Other related approaches for parameterized MDPs are case specific; for instance, [32] presents action-based parameterization of state space with application to service rate control in closed Jackson networks, and [33]- [38] incorporate parameterized actions that is applicable in the domain of RoboCup soccer where at each step the agent must select both the discrete action it wishes to execute as well as continuously valued parameters required by that action. On the other hand, the class of parameterized MDPs that we address in this article predominantly originate in network based applications that involves simultaneous routing and resource allocations and pose additional challenges of non-convexity and NP-hardness.…”

Section: Related Work In Parameterized Mdps and Rlmentioning

confidence: 99%

Parameterized MDPs and Reinforcement Learning Problems—A Maximum Entropy Principle-Based Framework

Srivastava

Salapaka

2022

IEEE Trans. Cybern.

View full text Add to dashboard Cite

We present a framework to address a class of sequential decision making problems. Our framework features learning the optimal control policy with robustness to noisy data, determining the unknown state and action parameters, and performing sensitivity analysis with respect to problem parameters. We consider two broad categories of sequential decision making problems modeled as infinite horizon Markov Decision Processes (MDPs) with (and without) an absorbing state. The central idea underlying our framework is to quantify exploration in terms of the Shannon Entropy of the trajectories under the MDP and determine the stochastic policy that maximizes it while guaranteeing a low value of the expected cost along a trajectory. This resulting policy enhances the quality of exploration early on in the learning process, and consequently allows faster convergence rates and robust solutions even in the presence of noisy data as demonstrated in our comparisons to popular algorithms such as Q-learning, Double Q-learning and entropy regularized Soft Q-learning. The framework extends to the class of parameterized MDP and RL problems, where states and actions are parameter dependent, and the objective is to determine the optimal parameters along with the corresponding optimal policy. Here, the associated cost function can possibly be non-convex with multiple poor local minima. Simulation results applied to a 5G small cell network problem demonstrate successful determination of communication routes and the small cell locations. We also obtain sensitivity measures to problem parameters and robustness to noisy environment data.

show abstract

Section: Related Work In Parameterized Mdps and Rlmentioning

confidence: 99%

Parameterized MDPs and Reinforcement Learning Problems—A Maximum Entropy Principle-Based Framework

Srivastava

Salapaka

2022

IEEE Trans. Cybern.

View full text Add to dashboard Cite

show abstract

“…However, the resulting problem is not necessarily an MDP as every transition from one state to another is dependent on the path (and the parameter values) taken till the current state. Other related approaches for parameterized MDPs are case specific; for instance, [31] presents actionbased parameterization of state space with application to service rate control in closed Jackson networks, and [32]- [37] incorporate parameterized actions that is applicable in the domain of RoboCup soccer where at each step the agent must select both the discrete action it wishes to execute as well as continuously valued parameters required by that action. On the other hand, the parameterized MDPs that we address in this article predominantly originate in network based applications that involves simultaneous routing and resource allocations and pose additional challenges of non-convexity and NP-hardness.…”

Section: Introductionmentioning

confidence: 99%

Parameterized MDPs and Reinforcement Learning Problems -- A Maximum Entropy Principle Based Framework

Srivastava,

Salapaka

2020

Preprint

View full text Add to dashboard Cite

We present a framework to address a class of sequential decision making problems. Our framework features learning the optimal control policy with robustness to noisy data, determining the unknown state and action parameters, and performing sensitivity analysis with respect to problem parameters. We consider two broad categories of sequential decision making problems modelled as infinite horizon Markov Decision Processes (MDPs) with (and without) an absorbing state. The central idea underlying our framework is to quantify exploration in terms of the Shannon Entropy of the trajectories under the MDP and determine the stochastic policy that maximizes it while guaranteeing a low value of the expected cost along a trajectory. This resulting policy enhances the quality of exploration early on in the learning process, and consequently allows faster convergence rates and robust solutions even in the presence of noisy data as demonstrated in our comparisons to popular algorithms such as Q-learning, Double Q-learning and entropy regularized Soft Q-learning. The framework extends to the class of parameterized MDP and RL problems, where states and actions are parameter dependent, and the objective is to determine the optimal parameters along with the corresponding optimal policy. Here, the associated cost function can possibly be non-convex with multiple poor local minima. Simulation results applied to a 5G small cell network problem demonstrate successful determination of communication routes and the small cell locations. We also obtain sensitivity measures to problem parameters and robustness to noisy environment data.

show abstract

Landmark based guidance for reinforcement learning agents under partial observability

Demіr

Çilden²,

Polat

2022

Int. J. Mach. Learn. & Cyber.

View full text Add to dashboard Cite

Toward Generalization of Automated Temporal Abstraction to Partially Observable Reinforcement Learning

Cited by 10 publications

References 14 publications

Parameterized MDPs and Reinforcement Learning Problems—A Maximum Entropy Principle-Based Framework

Parameterized MDPs and Reinforcement Learning Problems—A Maximum Entropy Principle-Based Framework

Parameterized MDPs and Reinforcement Learning Problems -- A Maximum Entropy Principle Based Framework

Landmark based guidance for reinforcement learning agents under partial observability

Contact Info

Product

Resources

About