This is the accepted version of the paper.This version of the publication may differ from the final published version. Abstract-This paper demonstrates the principal motivations for Dual Heuristic Dynamic Programming (DHP) learning methods for use in Adaptive Dynamic Programming and Reinforcement Learning, in continuous state spaces: that of automatic local exploration, improved learning speed and the ability to work without stochastic exploration in deterministic environments. In a simple experiment, the learning speed of DHP is shown to be around 1700 times faster than TD(0). DHP solves the problem without any exploration, whereas TD(0) cannot solve it without explicit exploration.
Permanent repository linkDHP requires knowledge of, and differentiability of, the environment's model functions. This paper aims to illustrate the advantages of DHP when these two requirements are satisfied.