The Linear Programming Approach to Approximate Dynamic Programming

Farias, Daniela Pucci de; Roy, Benjamin Van

doi:10.1287/opre.51.6.850.24925

Cited by 550 publications

(542 citation statements)

References 25 publications

Supporting

Mentioning

527

Contrasting

Unclassified

Order By: Relevance

“…Furthermore, there is, in general, no guarantee as to the quality of the greedy policy generated from the approximation Hw. However, the recent work of de Farias and Van Roy (2001a) provides some analysis of the error relative to that of the best possible approximation in the subspace, and some guidance as to selecting α so as to improve the quality of the approximation. In particular, their analysis shows that this LP provides the best approximation Hw * of the optimal value function V * in a weighted L 1 sense subject to the constraint that Hw * ≥ T * Hw * , where the weights in the L 1 norm are the state relevance weights α.…”

Section: Approximate Linear Programmentioning

confidence: 99%

Efficient Solution Algorithms for Factored MDPs

Guestrin¹,

Koller²,

Parr³

et al. 2003

jair

280

268

View full text Add to dashboard Cite

This paper addresses the problem of planning under uncertainty in large Markov Decision Processes (MDPs). Factored MDPs represent a complex state space using state variables and the transition model using a dynamic Bayesian network. This representation often allows an exponential reduction in the representation size of structured MDPs, but the complexity of exact solution algorithms for such MDPs can grow exponentially in the representation size. In this paper, we present two approximate solution algorithms that exploit structure in factored MDPs. Both use an approximate value function represented as a linear combination of basis functions, where each basis function involves only a small subset of the domain variables. A key contribution of this paper is that it shows how the basic operations of both algorithms can be performed efficiently in closed form, by exploiting both additive and context-specific structure in a factored MDP. A central element of our algorithms is a novel linear program decomposition technique, analogous to variable elimination in Bayesian networks, which reduces an exponentially large LP to a provably equivalent, polynomial-sized one. One algorithm uses approximate linear programming, and the second approximate dynamic programming. Our dynamic programming algorithm is novel in that it uses an approximation based on max-norm, a technique that more directly minimizes the terms that appear in error bounds for approximate MDP algorithms. We provide experimental results on problems with over 10 40 states, demonstrating a promising indication of the scalability of our approach, and compare our algorithm to an existing state-of-the-art approach, showing, in some problems, exponential gains in computation time.

show abstract

Section: Approximate Linear Programmentioning

confidence: 99%

Efficient Solution Algorithms for Factored MDPs

Guestrin¹,

Koller²,

Parr³

et al. 2003

jair

280

268

View full text Add to dashboard Cite

show abstract

“…In many applications of DP, the number of states and actions available in each state are large; consequently, the computational effort required to compute the optimal policy for a DP can be overwhelming -Bellman's "curse of dimensionality". For this reason, considerable recent research effort has focused on developing algorithms that compute an approximately optimal policy efficiently (Bertsekas and Tsitsiklis, 1996;de Farias and Van Roy, 2002).…”

Section: Introductionmentioning

confidence: 99%

Robust Dynamic Programming

2005

View full text Add to dashboard Cite

In this paper we propose a robust formulation for discrete time dynamic programming (DP). The objective of the robust formulation is to systematically mitigate the sensitivity of the DP optimal policy to ambiguity in the underlying transition probabilities. The ambiguity is modeled by associating a set of conditional measures with each state-action pair. Consequently, in the robust formulation each policy has a set of measures associated with it. We prove that when this set of measures has a certain "Rectangularity" property all the main results for finite and infinite horizon DP extend to natural robust counterparts. We identify families of sets of conditional measures for which the computational complexity of solving the robust DP is only modestly larger than solving the DP, typically logarithmic in the size of the state space. These families of sets are constructed from the confidence regions associated with density estimation, and therefore, can be chosen to guarantee any desired level of confidence in the robust optimal policy. Moreover, the sets can be easily parameterized from historical data. We contrast the performance of robust and non-robust DP on small numerical examples.

show abstract

“…Due to the "curse of dimensionality," Markov decision processes typically have a prohibitively large number of states, rendering exact dynamic programming methods intractable and calling for the development of approximation techniques. This paper represents a step in the development of a linear programming approach to approximate dynamic programming (de Farias and Van Roy 2003;Schweitzer and Seidmann 1985;Zin 1993, 1997). This approach relies on solving a linear program that generally has few variables but an intractable number of constraints.…”

mentioning

confidence: 99%

On Constraint Sampling in the Linear Programming Approach to Approximate Dynamic Programming

2004

View full text Add to dashboard Cite

In the linear programming approach to approximate dynamic programming, one tries to solve a certain linear program-the ALP-that has a relatively small number K of variables but an intractable number M of constraints. In this paper, we study a scheme that samples and imposes a subset of m M constraints. A natural question that arises in this context is: How must m scale with respect to K and M in order to ensure that the resulting approximation is almost as good as one given by exact solution of the ALP? We show that, given an idealized sampling distribution and appropriate constraints on the K variables, m can be chosen independently of M and need grow only as a polynomial in K. We interpret this result in a context involving controlled queueing networks. 1. Introduction. Due to the "curse of dimensionality," Markov decision processes typically have a prohibitively large number of states, rendering exact dynamic programming methods intractable and calling for the development of approximation techniques. This paper represents a step in the development of a linear programming approach to approximate dynamic programming (de Farias and Van Roy 2003;Schweitzer and Seidmann 1985; Zin 1993, 1997). This approach relies on solving a linear program that generally has few variables but an intractable number of constraints. In this paper, we propose and analyze a constraint sampling method for approximating the solution to this linear program. We begin in this section by discussing our working problem formulation, the linear programming approach, constraint sampling, results of our analysis, and related literature.

show abstract

The Linear Programming Approach to Approximate Dynamic Programming

Cited by 550 publications

References 25 publications

Efficient Solution Algorithms for Factored MDPs

Efficient Solution Algorithms for Factored MDPs

Robust Dynamic Programming

On Constraint Sampling in the Linear Programming Approach to Approximate Dynamic Programming

Contact Info

Product

Resources

About