2015
DOI: 10.1109/tcyb.2014.2352038
|View full text |Cite
|
Sign up to set email alerts
|

Toward Generalization of Automated Temporal Abstraction to Partially Observable Reinforcement Learning

Abstract: Temporal abstraction for reinforcement learning (RL) aims to decrease learning time by making use of repeated sub-policy patterns in the learning task. Automatic extraction of abstractions during RL process is difficult but has many challenges such as dealing with the curse of dimensionality. Various studies have explored the subject under the assumption that the problem domain is fully observable by the learning agent. Learning abstractions for partially observable RL is a relatively less explored area. In th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2017
2017
2025
2025

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 10 publications
(2 citation statements)
references
References 14 publications
0
2
0
Order By: Relevance
“…However, the resulting problem is not necessarily an MDP as every transition from one state to another is dependent on the path (and the parameter values) taken till the current state. Other related approaches for parameterized MDPs are case specific; for instance, [32] presents action-based parameterization of state space with application to service rate control in closed Jackson networks, and [33]- [38] incorporate parameterized actions that is applicable in the domain of RoboCup soccer where at each step the agent must select both the discrete action it wishes to execute as well as continuously valued parameters required by that action. On the other hand, the class of parameterized MDPs that we address in this article predominantly originate in network based applications that involves simultaneous routing and resource allocations and pose additional challenges of non-convexity and NP-hardness.…”
Section: Related Work In Parameterized Mdps and Rlmentioning
confidence: 99%
“…However, the resulting problem is not necessarily an MDP as every transition from one state to another is dependent on the path (and the parameter values) taken till the current state. Other related approaches for parameterized MDPs are case specific; for instance, [32] presents action-based parameterization of state space with application to service rate control in closed Jackson networks, and [33]- [38] incorporate parameterized actions that is applicable in the domain of RoboCup soccer where at each step the agent must select both the discrete action it wishes to execute as well as continuously valued parameters required by that action. On the other hand, the class of parameterized MDPs that we address in this article predominantly originate in network based applications that involves simultaneous routing and resource allocations and pose additional challenges of non-convexity and NP-hardness.…”
Section: Related Work In Parameterized Mdps and Rlmentioning
confidence: 99%
“…However, the resulting problem is not necessarily an MDP as every transition from one state to another is dependent on the path (and the parameter values) taken till the current state. Other related approaches for parameterized MDPs are case specific; for instance, [31] presents actionbased parameterization of state space with application to service rate control in closed Jackson networks, and [32]- [37] incorporate parameterized actions that is applicable in the domain of RoboCup soccer where at each step the agent must select both the discrete action it wishes to execute as well as continuously valued parameters required by that action. On the other hand, the parameterized MDPs that we address in this article predominantly originate in network based applications that involves simultaneous routing and resource allocations and pose additional challenges of non-convexity and NP-hardness.…”
Section: Introductionmentioning
confidence: 99%