2022
DOI: 10.1609/aaai.v36i7.20764
|View full text |Cite
|
Sign up to set email alerts
|

Blockwise Sequential Model Learning for Partially Observable Reinforcement Learning

Abstract: This paper proposes a new sequential model learning architecture to solve partially observable Markov decision problems. Rather than compressing sequential information at every timestep as in conventional recurrent neural network-based methods, the proposed architecture generates a latent variable in each data block with a length of multiple timesteps and passes the most relevant information to the next block for policy optimization. The proposed blockwise sequential model is implemented based on self-attentio… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1
1
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(1 citation statement)
references
References 4 publications
0
1
0
Order By: Relevance
“…In this vein, model-free RL methods such as Deep Recurrent Q-Networks (DRQN) 22 have been proposed, that demonstrated the potential of recurrent model-free RL in addressing POMDPs. Recently, researchers have also explored the development of specialized belief modules 23 and sequential model learning architectures 24 to facilitate convergence in specific problem domains. These approaches aim to improve RL's effectiveness in handling POMDPs by incorporating domain-specific modifications and adaptive learning mechanisms.…”
Section: Related Workmentioning
confidence: 99%
“…In this vein, model-free RL methods such as Deep Recurrent Q-Networks (DRQN) 22 have been proposed, that demonstrated the potential of recurrent model-free RL in addressing POMDPs. Recently, researchers have also explored the development of specialized belief modules 23 and sequential model learning architectures 24 to facilitate convergence in specific problem domains. These approaches aim to improve RL's effectiveness in handling POMDPs by incorporating domain-specific modifications and adaptive learning mechanisms.…”
Section: Related Workmentioning
confidence: 99%