2023
DOI: 10.1016/j.eswa.2023.120495
|View full text |Cite
|
Sign up to set email alerts
|

Reinforcement learning algorithms: A brief survey

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
16
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 106 publications
(16 citation statements)
references
References 179 publications
0
16
0
Order By: Relevance
“…The actions taken by the agent are guided by either trial-and-error, or based on the state of the environment and rewards, or a combination of both. By performing these actions iteratively, the agent learns an optimal behavioral strategy based on the rewards received from previous interactions [49].…”
Section: Reinforcement Learningmentioning
confidence: 99%
See 1 more Smart Citation
“…The actions taken by the agent are guided by either trial-and-error, or based on the state of the environment and rewards, or a combination of both. By performing these actions iteratively, the agent learns an optimal behavioral strategy based on the rewards received from previous interactions [49].…”
Section: Reinforcement Learningmentioning
confidence: 99%
“…Reinforcement Learning (RL) is a machine learning technique to learn sequential decision-making in complex scenarios [49]. RL is inspired by trial-and-error based human/animal learning, wherein the learning is driven by a scaler quantity (the ‘reinforcement’ or the ‘reward’) and the goal of the algorithm is to maximize the expected future cumulative reward [50].…”
Section: Introductionmentioning
confidence: 99%
“…The key feature of Q-learning is that it estimates the action-value function Q which leads to directly approximating q * (the optimal action-value function), regardless of the policy being executed [58,59]. This technique is defined in Equation ( 24) as follows:…”
Section: Q-learningmentioning
confidence: 99%
“…As mentioned in the previous section, temporal difference learning is a combination of the ideas of both Monte Carlo and dynamic programming. Therefore, the TD algorithm learns from experience where there is an unknown model or no model of the environment’s dynamic, similar to MC [ 58 ]. On the other way, TD, like DP algorithms, updates estimates depending on the other learned estimates without waiting for a whole episode to be finished.…”
Section: Reinforcement Learning (Rl)mentioning
confidence: 99%
See 1 more Smart Citation