Functional equations in dynamic programming

Bellman, Richard; Lee, E. Stanley

doi:10.1007/bf01818535

Cited by 172 publications

(74 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…It is well known that equation of the type (4.1) provides useful tools for mathematical optimization, computer and dynamic programming (see, [9,12]). Let B(W ) denote the space of all bounded real-valued functions defined on the set W , where B(W ) is endowed with the metric d(h, k) = sup x∈W |hx − kx| for all h, k ∈ B(W ).…”

Section: An Application In Dynamic Programmingmentioning

confidence: 99%

Common fixed point results of generalized almost rational contraction mappings with an application

Hussain¹,

Işık²,

Abbas³

2016

J. Nonlinear Sci. Appl.

View full text Add to dashboard Cite

In this paper, we introduce the notion of generalized almost rational contraction with respect to a pair of self mappings on a complete metric space. Several common fixed point results for such mappings are proved. Our results extend and unify various results in the existing literature. An example and application to obtain the existence of a common solution of the system of functional equations arising in dynamic programming are also given in order to illustrate the effectiveness of the presented results.

show abstract

Section: An Application In Dynamic Programmingmentioning

confidence: 99%

Common fixed point results of generalized almost rational contraction mappings with an application

Hussain¹,

Işık²,

Abbas³

2016

J. Nonlinear Sci. Appl.

View full text Add to dashboard Cite

show abstract

“…The idea of using value function approximation goes back to the early days of dynamic programming (Samuel, 1959;Bellman and Dreyfus, 1959). With the recent growth of interest in reinforcement learning, work on value function approximation methods flourished (Bertsekas and Tsitsiklis, 1996;Sutton and Barto, 1998).…”

Section: Related Workmentioning

confidence: 99%

Learning Near-Optimal Policies with Bellman-Residual Minimization Based Fitted Policy Iteration and a Single Sample Path

2006

View full text Add to dashboard Cite

To cite this version:Andras Antos, Csaba Szepesvari, Rémi Munos. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path. Machine Learning Journal, Springer, 2008, pp.71:89-129. The date of receipt and acceptance will be inserted by the editor Abstract We consider the problem of finding a near-optimal policy using value-function methods in continuous space, discounted Markovian Decision Problems (MDP) when only a single trajectory underlying some policy can be used as the input. Since the state-space is continuous, one must resort to the use of function approximation. In this paper we study a policy iteration algorithm iterating over action-value functions where the iterates are obtained by empirical risk minimization, where the loss function used penalizes high magnitudes of the Bellman-residual. It turns out that when a linear parameterization is used the algorithm is equivalent to least-squares policy iteration. Our main result is a finite-sample, high-probability bound on the performance of the computed policy that depends on the mixing rate of the trajectory, the capacity of the function set as measured by a novel capacity concept (the VC-crossing dimension), the approximation power of the function set and the controllability properties of the MDP. To the best of our knowledge this is the first theoretical result for off-policy control learning over continuous state-spaces using a single trajectory.

show abstract

“…If suitable features and representative states are chosen, V(W(t)) may converge to a reasonable approximation of the optimal cost-to-go vector V*. Such an algorithm has been considered in the literature (Bellman (1959), Reetz (1977), Morin (1979)). Of these references, only (Reetz (1977)), establishes convergence and error bounds.…”

Section: Algorithmic Modelmentioning

confidence: 99%

“…Bellman and Dreyfus (1959) explored the use of polynomials as compact representations for accelerating dynamic programming. Whitt (1978) and Reetz (1977) analyzed approaches of reducing state space sizes, which lead to compact representations.…”

Section: Introductionmentioning

confidence: 99%

Feature-based methods for large scale dynamic programming

1996

View full text Add to dashboard Cite

Abstract. We develop a methodological framework and present a few different ways in which dynamic programming and compact representations can be combined to solve large scale stochastic control problems. In particular, we develop algorithms that employ two types of feature-based compact representations; that is, representations that involve feature extraction and a relatively simple approximation architecture. We prove the convergence of these algorithms and provide bounds on the approximation error. As an example, one of these algorithms is used to generate a strategy for the game of Tetris. Furthermore, we provide a counterexample illustrating the difficulties of integrating compact representations with dynamic programming, which exemplifies the shortcomings of certain simple approaches.

show abstract

Functional equations in dynamic programming

Cited by 172 publications

References 6 publications

Common fixed point results of generalized almost rational contraction mappings with an application

Common fixed point results of generalized almost rational contraction mappings with an application

Learning Near-Optimal Policies with Bellman-Residual Minimization Based Fitted Policy Iteration and a Single Sample Path

Feature-based methods for large scale dynamic programming

Contact Info

Product

Resources

About