2022
DOI: 10.1609/aaai.v36i8.20820
|View full text |Cite
|
Sign up to set email alerts
|

What about Inputting Policy in Value Function: Policy Representation and Policy-Extended Value Function Approximator

Abstract: We study Policy-extended Value Function Approximator (PeVFA) in Reinforcement Learning (RL), which extends conventional value function approximator (VFA) to take as input not only the state (and action) but also an explicit policy representation. Such an extension enables PeVFA to preserve values of multiple policies at the same time and brings an appealing characteristic, i.e., value generalization among policies. We formally analyze the value generalization under Generalized Policy Iteration (GPI). From theo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(3 citation statements)
references
References 6 publications
0
3
0
Order By: Relevance
“…To our knowledge, PeVFA (Tang et al, 2020) and related variants are trained in an on-policy fashion in prior works. In the original paper of PeVFA, a newly implemented PPO algorithm, called PPO-PeVFA is studied.…”
Section: F Additional Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…To our knowledge, PeVFA (Tang et al, 2020) and related variants are trained in an on-policy fashion in prior works. In the original paper of PeVFA, a newly implemented PPO algorithm, called PPO-PeVFA is studied.…”
Section: F Additional Discussionmentioning
confidence: 99%
“…Conventional value functions are defined on a specific policy. Recently, a new extension called Policy-extended Value Function Approximator (PeVFA) (Tang et al, 2020) is proposed to preserve the values of multiple policies. Concretely, given some representation χ π of policy π, a PeVFA parameterized by θ takes as input χ π additionally, i.e., Q θ (s, a, χ π ).…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation