2015
DOI: 10.1007/s10489-015-0670-1
|View full text |Cite
|
Sign up to set email alerts
|

A formal proof of the 𝜖-optimality of discretized pursuit algorithms

Abstract: Learning Automata (LA) can be reckoned to be the founding algorithms on which the field of Reinforcement Learning has been built. Among the families of LA, Estimator Algorithms (EAs) are certainly the fastest, and of these, the family of discretized algorithms are proven to converge even faster than their continuous counterparts. However, it has recently been reported that the previous proofs for ε-optimality for all the reported algorithms for the past three decades have been flawed 1 . We applaud the researc… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
17
0

Year Published

2016
2016
2022
2022

Publication Types

Select...
4
2
1

Relationship

2
5

Authors

Journals

citations
Cited by 11 publications
(17 citation statements)
references
References 27 publications
0
17
0
Order By: Relevance
“…The proof of Conjecture 1 is beyond the aim of this article and we allude to the proofs reported in [39] as a potential possible way to justify it.…”
Section: Theorem 3 Consider the Scenario Whenmentioning
confidence: 99%
See 1 more Smart Citation
“…The proof of Conjecture 1 is beyond the aim of this article and we allude to the proofs reported in [39] as a potential possible way to justify it.…”
Section: Theorem 3 Consider the Scenario Whenmentioning
confidence: 99%
“…Inspired by the family of pursuit LA algorithm [23,2,39], we design a novel pursuit LA for S-Model that pursues the action that has the highest "average reward" among the two actions. Please note that the classical pursuit LA found in the literature [23,2,39] operate only with binary feedback while our scheme uses continuous feedback. Now, we shall provide the details of our Pursuit S-LA algorithm that is discretized.…”
Section: Update Rules For Algorithm 3: Pursuit S-lamentioning
confidence: 99%
“…Though the work in [24] was innovative and compass both the CPA and the DPA, there was a flaw in its reasoning, which rendered the analysis incorrect. To correct this flaw, new proofs for the CPA and DPA's convergence were proposed in [36] and [35] respectively, where the authors investigated the submartingale property of the probability of selecting the optimal action, and invoked the theory of regular functions to prove that the PAs converge to the optimal action in probability. They also claimed that the new proof methodology could be extended to prove the convergence of other Pursuit-based LAs.…”
Section: A Prior Flawed "Proofs" For Easmentioning
confidence: 99%
“…Our algorithm is inspired by the family of pursuit LA algorithm [1,15,26]. However, instead of pursuing the action with the highest reward among the offered actions, we pursue the action that leads to an increase in the reward compared to the previously visited state at time instant t − 1…”
Section: Remarkmentioning
confidence: 99%