Abstract-The fastest learning automata (LA) algorithms currently available fall in the family of estimator algorithms introduced by Thathachar and Sastry [24]. The pioneering work of these authors was the pursuit algorithm, which pursues only the current estimated optimal action. If this action is not the one with the minimum penalty probability, this algorithm pursues a wrong action. In this paper, we argue that a pursuit scheme that generalizes the traditional pursuit algorithm by pursuing all the actions with higher reward estimates than the chosen action, minimizes the probability of pursuing a wrong action, and is a faster converging scheme. To attest this, we present two new generalized pursuit algorithms (GPAs) and also present a quantitative comparison of their performance against the existing pursuit algorithms. Empirically, the algorithms proposed here are among the fastest reported LA to date.Index Terms-Estimator algorithms, learning automata (LA), pursuit algorithms.
We consider the problem of a learning mechanism (for example, a robot) locating a point on a line when it is interacting with an random environment which essentially informs it, possibly erroneously, which way it should move. In this paper we present a novel scheme by which the point can be learnt using some recently devised learning principles. The heart of the strategy involves discretizing the space and performing a controlled random walk on this space. The scheme is shown to be ε-optimal and to converge with probability 1. Although the problem is solved in its generality, its application in non-linear optimization has also been suggested. Typically, an optimization process involves working one's way toward the maximum (minimum) using the local information that is available. However, the crucial issue in these strategies is that of determining the parameter to be used in the optimization itself. If the parameter is too small the convergence is sluggish. On the other hand, if the parameter is too large, the system could erroneously converge or even oscillate. Our strategy can be used to determine the best parameter to be used in the optimization.
A Learning Automaton is an automaton that interacts with a random environment, having as its goal the task of learning the optimal action based on its acquired experience. Many learning automata have been proposed, with the class of Estimator Algorithms being among the fastest ones. Thathachar and Sastry [23], through the Pursuit Algorithm, introduced the concept of learning algorithms that pursue the current optimal action, following a Reward-Penalty learning philosophy. Later, Oommen and Lanctôt [16] extended the Pursuit Algorithm into the discretized world by presenting the Discretized Pursuit Algorithm, based on a Reward-Inaction learning philosophy. In this paper, we argue that the Reward-Penalty and RewardInaction learning paradigms in conjunction with the continuous and discrete models of computation lead to four versions of Pursuit Learning Automata. We contend that a scheme that merges the Pursuit concept with the most recent response of the Environment permits the algorithm to utilize the LA's longterm and short-term perspectives of the Environment. In this paper, we present all the four resultant Pursuit algorithms, and also present a quantitative comparison between them. Although the present comparison is solely based on rigorous experimental results, we are currently investigating a formal convergence analysis of the various schemes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.