2011
DOI: 10.1002/9781118029176
|View full text |Cite
|
Sign up to set email alerts
|

Approximate Dynamic Programming

Abstract: This is my preface. I am going to explain why I wrote this book and who it is for. Chapter 1The challenges of dynamic programmingThe optimization of problems over time arises in many settings, ranging from the control of heating systems to managing entire economies. In between are examples including landing aircraft, purchasing new equipment, managing blood inventories, scheduling fleets of vehicles, selling assets, investing money in portfolios or just playing a game of tic-tac-toe or backgammon. These proble… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
202
0
3

Year Published

2011
2011
2015
2015

Publication Types

Select...
5
4
1

Relationship

1
9

Authors

Journals

citations
Cited by 1,009 publications
(205 citation statements)
references
References 121 publications
0
202
0
3
Order By: Relevance
“…This type of scenario exploration would tell managers the permissible level of fishing given a target biomass and recent ocean conditions. This approach is analogous to the greedy heuristic strategies that are used in high-dimensional approximate optimization problems (26). Of course, considerable work remains to be done to develop and evaluate management plans based on these methods.…”
Section: Discussionmentioning
confidence: 99%
“…This type of scenario exploration would tell managers the permissible level of fishing given a target biomass and recent ocean conditions. This approach is analogous to the greedy heuristic strategies that are used in high-dimensional approximate optimization problems (26). Of course, considerable work remains to be done to develop and evaluate management plans based on these methods.…”
Section: Discussionmentioning
confidence: 99%
“…This algorithm is particularly interesting when the number of states is huge. In this case, classical algorithms like Minimax and Alphabeta [9], for two-player games, and Dynamic Programming [13], for one-player games, are too time-consuming or not efficient. MCTS combines an exploration of the tree based on a compromise between exploration and exploitation, and an evaluation based on Monte-Carlo simulations.…”
Section: Introductionmentioning
confidence: 99%
“…It is weaker because it does not have to assign the precise quantity "discounted future sum of rewards"; any number will do as long as it helps to grow the tree in roughly the right direction. It is for this reason that we believe it could be advantageous to try to learn a good scoring function instead of trying to directly learn/approximate the optimal value function: the former can be a rather simple function (as evidenced in the good results we get in Section 5 where we use for all domains exactly the same simple weighted sum of features), whereas the latter would require a far more expressive parametrization (and it is well-known that value function approximation scales badly when the dimensionality of the state space grows [41]). Figure 2 presents a simple algorithm based on a sorted list to implement policies as parameterized look-ahead trees.…”
Section: Connection With the Optimal Value Functionmentioning
confidence: 99%