“…In particular, GTD2, proposed in [4], can be interpreted as a stochastic primal-dual gradient dynamics (PDGD) of a convex-concave saddlepoint problem, and hence, its convergence analysis can be approached from a different angle using optimization theory [5], [6]. These interpretations were subsequently applied to distributed RL problems in [8], [9], [10], and [11].…”