“…With the inverse control policy from the last forward pass, SELQR computes û t and x̂t (line 15), around which the stochastic discrete dynamics can be linearized as (11) where denotes the i'th column of matrix M t , and A t , B t , , , a t , and are given matrices and vectors of the appropriate dimension, and the cost function c t can be quadratized as (12) By substituting the linear stochastic dynamics and quadratic local cost function into Eq. 8, expanding the expectation, and then collecting terms, we get a quadratic expression of the value function v t (x), (13) Sun et al Page 7 where following the similar derivation in [21].…”