“…The main lesson here is that off-line planning in the worst-case can scale exponentially with the dimensionality of the state space (Chow and Tsitsiklis, 1989), while online planning (i.e., planning for the "current state") can break the curse of dimensionality by amortizing the planning effort over multiple time steps (Rust, 1996;Szepesvári, 2001). Other topics of interest include the linear programming-based approaches (de Farias and Van Roy, 2003, 2006, dual dynamic programming (Wang et al, 2008), techniques based on sample average approximation (Shapiro, 2003) such as PEGASUS (Ng and Jordan, 2000), online learning in MDPs with arbitrary reward processes (Even-Dar et al, 2005;Neu et al, 2010), or learning with (almost) no restrictions in a competitive framework (Hutter, 2004). Other important topics include learning and acting in partially observed MDPs (for recent developments, see, e.g., Littman et al, 2001;Toussaint et al, 2008;, learning and acting in games or under some other optimization criteria (Littman, 1994;Heger, 1994;Szepesvári and Littman, 1999;Borkar and Meyn, 2002), or the development of hierarchical and multi-time-scale methods (Dietterich, 1998;Sutton et al, 1999b).…”