2019
DOI: 10.1155/2019/7619483
|View full text |Cite
|
Sign up to set email alerts
|

A Dynamic Adjusting Reward Function Method for Deep Reinforcement Learning with Adjustable Parameters

Abstract: In deep reinforcement learning, network convergence speed is often slow and easily converges to local optimal solutions. For an environment with reward saltation, we propose a magnify saltatory reward (MSR) algorithm with variable parameters from the perspective of sample usage. MSR dynamically adjusts the rewards for experience with reward saltation in the experience pool, thereby increasing an agent’s utilization of these experiences. We conducted experiments in a simulated obstacle avoidance search environm… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
17
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
9

Relationship

1
8

Authors

Journals

citations
Cited by 27 publications
(17 citation statements)
references
References 18 publications
0
17
0
Order By: Relevance
“…For each episode of the Q-learning algorithm, we average Q-values from all actions. First, we study the effects of different reward functions [29]- [31] on the convergence of the Q-learning algorithm. We map the value y of the SLA to a new number via x = y × x max − x min y max − y min + y min × (1 +…”
Section: B Call Center Workforce Management With Reinforcement Learningmentioning
confidence: 99%
“…For each episode of the Q-learning algorithm, we average Q-values from all actions. First, we study the effects of different reward functions [29]- [31] on the convergence of the Q-learning algorithm. We map the value y of the SLA to a new number via x = y × x max − x min y max − y min + y min × (1 +…”
Section: B Call Center Workforce Management With Reinforcement Learningmentioning
confidence: 99%
“…RL methods are based on a reward signal coming from the environment as a feedback to evaluate the performed action. However, defining a proper reward function in complex and dynamic environments is a big challenge [5]. Active Inference (AIn) [6] can overcome this challenging task by replacing reward functions with prior beliefs about desired sensory signals received from the environment.…”
Section: Introductionmentioning
confidence: 99%
“…Hu [3] show effects of arranging reward function. By increasing positive rewards (or only rewards) and decreasing negative rewards (or penalty) makes agent converge more quickly.…”
Section: Introductionmentioning
confidence: 99%
“…In the study, the dividing approach proposed in [2] is used for the rules of the environment and different reward functions are set according to the approach proposed in [3]. Thereby, it is examined how to determine the reward function which is an important factor for Reinforcement Learning.…”
Section: Introductionmentioning
confidence: 99%