2015
DOI: 10.1016/j.neucom.2015.05.075
|View full text |Cite
|
Sign up to set email alerts
|

Nonlinear neuro-optimal tracking control via stable iterative Q-learning algorithm

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2015
2015
2024
2024

Publication Types

Select...
5
1

Relationship

1
5

Authors

Journals

citations
Cited by 30 publications
(2 citation statements)
references
References 50 publications
0
2
0
Order By: Relevance
“…Q-learning, proposed by Watkins [44,45], is a representative data-based adaptive dynamic programming algorithm. In the QL algorithm, the Q function depends on both system state and control, and updates policy through continuous observation of rewards of all state-action pairs [37]. The value of an action at any state can be defined using a Q-value, which is the sum of the immediate reward after executing action "a" at state "s" and the discounted reward from subsequent actions according to the best strategy.…”
Section: Q-learning Algorithmmentioning
confidence: 99%
See 1 more Smart Citation
“…Q-learning, proposed by Watkins [44,45], is a representative data-based adaptive dynamic programming algorithm. In the QL algorithm, the Q function depends on both system state and control, and updates policy through continuous observation of rewards of all state-action pairs [37]. The value of an action at any state can be defined using a Q-value, which is the sum of the immediate reward after executing action "a" at state "s" and the discounted reward from subsequent actions according to the best strategy.…”
Section: Q-learning Algorithmmentioning
confidence: 99%
“…Among the ML, the Q-learning (QL) is one of the reinforcement learning (RL) methods and a provably convergent direct optimal adaptive control algorithm [35]. Since offering the significant advantages of learning mechanisms that can ensure the inherent adaptability for a dynamic environment, QL can be used to find an optimal action-selection policy based on the historical and/or present state and action control [35][36][37], even for the completely uncertain or unknown dynamics [38]. Figuratively speaking, as a real human environment, the QL algorithm does not necessarily rely on a single agent to search the complete state-action space to obtain the optimal policy, but exchanges information, learning from the others [39].…”
Section: Introductionmentioning
confidence: 99%