“…µ a|s p a ss ca,µ ss + γV µ βΥ (s ) + c 0 (s), (7) where µ a|s = µ(a|s), p a ss = p(s |s, a), ca,µ ss = c(s, a, s ) + γ β log p(s |s, a)+ γ β log µ a|s for simplicity in notation, and c 0 (s) depends on γ and β, and is independent of the policy µ and the parameters Υ. Without loss of generality, we ignore c 0 (s) in the subsequent calculations (see [23]). For proof of the above Bellman equation please see Theorem 1 in [12] (or detailed proof in [23]).…”