2021
DOI: 10.48550/arxiv.2105.01136
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Learning Good State and Action Representations via Tensor Decomposition

Abstract: The transition kernel of a continuous-state-action Markov decision process (MDP) admits a natural tensor structure. This paper proposes a tensor-inspired unsupervised learning method to identify meaningful low-dimensional state and action representations from empirical trajectories. The method exploits the MDP's tensor structure by kernelization, importance sampling and low-Tucker-rank approximation. This method can be further used to cluster states and actions respectively and find the best discrete MDP abstr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
1
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
3

Relationship

1
2

Authors

Journals

citations
Cited by 3 publications
(1 citation statement)
references
References 44 publications
0
1
0
Order By: Relevance
“…Foster et al (2021) focus on instance-dependent bounds, but their bounds scale with a value function disagreement coefficient and inverse value gap, both of which can be arbitrarily large in general Block MDPs (e.g., disagreement coefficient is a stronger notation than the usual classic notation of uniform convergence which is what we use here). Finally,Duan et al (2019) andNi et al (2021) study state abstraction learning from logged data, without trying to identify the optimal policy.…”
mentioning
confidence: 99%
“…Foster et al (2021) focus on instance-dependent bounds, but their bounds scale with a value function disagreement coefficient and inverse value gap, both of which can be arbitrarily large in general Block MDPs (e.g., disagreement coefficient is a stronger notation than the usual classic notation of uniform convergence which is what we use here). Finally,Duan et al (2019) andNi et al (2021) study state abstraction learning from logged data, without trying to identify the optimal policy.…”
mentioning
confidence: 99%