2021
DOI: 10.1007/978-3-030-82099-2_39
|View full text |Cite
|
Sign up to set email alerts
|

Fuzzy Baselines to Stabilize Policy Gradient Reinforcement Learning

Abstract: Policy gradient methods are amongst the most efficient for on-policy, model-free reinforcement learning. However, they suffer from high variance in gradient updates, making them unstable during training. Subtracting a baseline from the rewards is an effective strategy to reduce variance, such as in actorcritic models. This work presents a variation of the actor-critic model that uses a fuzzy system instead of a neural network to estimate the state value function. The fuzzy value approximation is inspired by pr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Publication Types

Select...

Relationship

0
0

Authors

Journals

citations
Cited by 0 publications
references
References 14 publications
0
0
0
Order By: Relevance

No citations

Set email alert for when this publication receives citations?