1993
DOI: 10.21236/ada276517
|View full text |Cite
|
Sign up to set email alerts
|

On the Convergence of Stochastic Iterative Dynamic Programming Algorithms

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
128
0
4

Year Published

1996
1996
2021
2021

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 152 publications
(132 citation statements)
references
References 5 publications
0
128
0
4
Order By: Relevance
“…Before stating our convergence theorem, we must introduce the following standard assumption concerning the stepsize sequence: A proof of Theorem 1 is provided in Appendix C. We prove the theorem by showing that the algorithm corresponds to a stochastic approximation involving a maximum norm contraction, and then appeal to a theorem concerning asynchronous stochastic approximation due to Tsitsiklis (1994) (see also (Jaakola, Jordan, and Singh, 1994)), which is discussed in Appendix B, and a theorem concerning multi-representation contractions presented and proven in Appendix A.…”
Section: Convergence Theoremmentioning
confidence: 99%
See 1 more Smart Citation
“…Before stating our convergence theorem, we must introduce the following standard assumption concerning the stepsize sequence: A proof of Theorem 1 is provided in Appendix C. We prove the theorem by showing that the algorithm corresponds to a stochastic approximation involving a maximum norm contraction, and then appeal to a theorem concerning asynchronous stochastic approximation due to Tsitsiklis (1994) (see also (Jaakola, Jordan, and Singh, 1994)), which is discussed in Appendix B, and a theorem concerning multi-representation contractions presented and proven in Appendix A.…”
Section: Convergence Theoremmentioning
confidence: 99%
“…We then have the following result (Tsitsiklis, 1994) (related results are obtained in (Jaakola, Jordan, and Singh, 1994)): THEOREM 4 Let Assumption 6 and Assumption 1 of Section 5 on the stepsizes ai(t) hold and suppose that the mapping T is a contraction with respect to the maximum norm. Then, V(t) converges to the unique fixed point V* ofT, with probability 1.…”
Section: Assumption 6 (A) For Ever) I and T We Have E[rh(t ) [ -)C(mentioning
confidence: 99%
“…Tal algoritmoé baseado nos conceitos do método de diferenças temporais e sua convergência para os valoresótimos de Q (Q * (s, a)) independe da política que está sendo utilizada. A expressão de atualização do valor de Q do algoritmo Q-Learningé a seguinte: O Q-Learning foi o primeiro método de aprendizagem por reforço a possuir fortes provas de convergência [28].É uma técnica muito simples que calcula diretamente as ações sem avaliações intermediárias e sem uso de modelo. Watkins mostrou que se cada para estado-ação for visitado um número infinito de vezes e com um valor de α adequado, a função de valor Q irá convergir com probabilidade 1 para Q * .…”
Section: Q-learningunclassified
“…The convergence is generally established with probability 1 (w.p.1) because of the RM basis of RL. The first results for asynchronous convergence of discounted Q-Learning (Jaakola et al, 1994;Tsitsiklis, 1994) were based on norm contractions. The idea of ordinary differential equations (ODEs) for proving convergence under asynchronous conditions was proposed in Borkar (1998), where it was shown that the iterate tracks an ODE, which is much slower than that shown to exist under synchronous conditions (Kushner and Clark, 1978).…”
Section: Semi-markov Decision Problemsmentioning
confidence: 99%