Simulation-Based Optimization with Stochastic Approximation Using Common Random Numbers

Kleinman, Nathan L.; Spall, James C.; Naiman, Daniel Q.

doi:10.1287/mnsc.45.11.1570

Cited by 82 publications

(45 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…On the task of tuning the parameters of the opponent model, RSPSA resulted in a significantly better performance as compared to that obtained by using RFDSA. This confirms some of the previous findings such as those of Spall, Kleinman, Spall andNeiman (1992, 1999), whilst it contradicts some expectations published elsewhere, such as in Kushner and Yin (1997) and Dippon (2003). In the case of policy optimisation, RSPSA was competitive with TD-learning, although the combination of supervised learning followed by TD-learning outperformed RSPSA.…”

Section: Discussionsupporting

confidence: 81%

“…In fact, if this method is employed, the convergence rate is improved to O(t −1/2 ). This was shown for FDSA by Glasserman and Yao (1992) and L'Ecuyer and Yin (1998) and later extended to SPSA by Kleinman, Spall and Neiman (1999).…”

Section: Efficiencymentioning

confidence: 61%

“…When the objective function is evaluated by means of running some computer simulation then it has been observed that the method of Common Random Numbers (CRN) can be used to decrease the variance of the gradient's estimate (Glasserman & Yao, 1992;L'Ecuyer & Yin, 1998;Kleinman, Spall & Neiman, 1999). In fact, if this method is employed, the convergence rate is improved to O(t −1/2 ).…”

Section: Efficiencymentioning

confidence: 99%

“…The technique just described is termed the method of Partial Common Random Numbers (PCRN). Some experimental results comparing SPSA and FDSA with and without (P)CRN are given by Kleinman, Spall and Neiman (1999).…”

Section: Efficiencymentioning

confidence: 99%

See 3 more Smart Citations

Universal parameter optimisation in games based on SPSA

Kocsis

Szepesvári

2006

Mach Learn

View full text Add to dashboard Cite

Most game programs have a large number of parameters that are crucial for their performance. While tuning these parameters by hand is rather difficult, efficient and easy to use generic automatic parameter optimisation algorithms are known only for special problems such as the adjustment of the parameters of an evaluation function. The SPSA algorithm (Simultaneous Perturbation Stochastic Approximation) is a generic stochastic gradient method for optimising an objective function when an analytic expression of the gradient is not available, a frequent case in game programs. Further, SPSA in its canonical form is very easy to implement. As such, it is an attractive choice for parameter optimisation in game programs, both due to its generality and simplicity. The goal of this paper is twofold: (i) to introduce SPSA for the game programming community by putting it into a game-programming perspective, and (ii) to propose and discuss several methods that can be used to enhance the performance of SPSA. These methods include using common random numbers and antithetic variables, a combination of SPSA with RPROP, and the reuse of samples of previous performance evaluations. SPSA with the proposed enhancements was tested in some large-scale experiments on tuning the parameters of an opponent model, a policy and an evaluation function in our poker program, MCRAISE. Whilst SPSA with no enhancements failed to make progress using the allocated resources, SPSA with the enhancements proved to be competitive with other methods, including TD-learning; increasing the average payoff per game by as large as 0.19 times the size of the amount of the small bet. From the experimental study, we conclude that the use of an appropriately enhanced variant of SPSA for the optimisation of game program parameters is a viable approach, especially if no good alternative exist for the types of parameters considered.

show abstract

Section: Discussionsupporting

confidence: 81%

Section: Efficiencymentioning

confidence: 61%

Section: Efficiencymentioning

confidence: 99%

Section: Efficiencymentioning

confidence: 99%

See 2 more Smart Citations

Universal parameter optimisation in games based on SPSA

Kocsis

Szepesvári

2006

Mach Learn

View full text Add to dashboard Cite

show abstract

“…If single parameters are perturbed, this method is known as the Kiefer-Wolfowitz procedure and if multiple parameters are perturbed simultaneously, it is known as Simultaneous Perturbation Stochastic gradient Approximation (SPSA), see Sadegh and Spall (1997) and Spall (2003) for in-depth treatment. This approach can be highly efficient in simulation optimization of deterministic systems (Spall, 2003) or when a common history of random numbers (Glynn, 1987;Kleinman, Spall, & Naiman, 1999) is being used (the later trick is known as the PEGASUS method in reinforcement learning, see Ng and Jordan (2000)), and can get close to a convergence rate of O(I −1/2 ) (Glynn, 1987). However, when used on a real system, the uncertainties degrade the performance resulting in convergence rates ranging between O(I −1/4 ) and O(I −2/5 ) depending on the chosen reference value (Glynn, 1987).…”

Section: Finite-difference Methodsmentioning

confidence: 99%

Reinforcement learning of motor skills with policy gradients

2008

View full text Add to dashboard Cite

a b s t r a c tAutonomous learning is one of the hallmarks of human and animal behavior, and understanding the principles of learning will be crucial in order to achieve true autonomy in advanced machines like humanoid robots. In this paper, we examine learning of complex motor skills with human-like limbs. While supervised learning can offer useful tools for bootstrapping behavior, e.g., by learning from demonstration, it is only reinforcement learning that offers a general approach to the final trial-and-error improvement that is needed by each individual acquiring a skill. Neither neurobiological nor machine learning studies have, so far, offered compelling results on how reinforcement learning can be scaled to the high-dimensional continuous state and action spaces of humans or humanoids. Here, we combine two recent research developments on learning motor control in order to achieve this scaling. First, we interpret the idea of modular motor control by means of motor primitives as a suitable way to generate parameterized control policies for reinforcement learning. Second, we combine motor primitives with the theory of stochastic policy gradient learning, which currently seems to be the only feasible framework for reinforcement learning for humanoids. We evaluate different policy gradient methods with a focus on their applicability to parameterized motor primitives. We compare these algorithms in the context of motor primitive learning, and show that our most modern algorithm, the Episodic Natural Actor-Critic outperforms previous algorithms by at least an order of magnitude. We demonstrate the efficiency of this reinforcement learning method in the application of learning to hit a baseball with an anthropomorphic robot arm.

show abstract

Simultaneous Perturbation and Finite Difference Methods

Bhatnagar

2011

Wiley Encyclopedia of Operations Research and Management Science

View full text Add to dashboard Cite

This article presents a survey of simultaneous perturbation and finite difference algorithms. One of the most important in this category of algorithms is the simultaneous perturbation stochastic approximation (SPSA) algorithm that has been used in a wide range of settings. The article presents an overview of the theory behind SPSA as well as the recent development of other finite‐difference algorithms along the lines of SPSA. The article also gives an overview of the applications of SPSA and SPSA‐type algorithms in various disciplines of engineering and science.

show abstract

Simulation-Based Optimization with Stochastic Approximation Using Common Random Numbers

Cited by 82 publications

References 14 publications

Universal parameter optimisation in games based on SPSA

Universal parameter optimisation in games based on SPSA

Reinforcement learning of motor skills with policy gradients

Simultaneous Perturbation and Finite Difference Methods

Contact Info

Product

Resources

About