This article develops distributed optimal control policies via Q‐learning for multi‐agent systems (MASs) by solving dual games. According to game theory, first, the distributed consensus problem is formulated as a multi‐player non‐zero‐sum game, where each agent is viewed as a player focusing only on its local performance and the whole MAS achieves Nash equilibrium. Second, for each agent, the anti‐disturbance problem is formulated as a two‐player zero‐sum game, in which the control input and external disturbance are a pair of opponents. Specifically, (1) an offline data‐driven off‐policy for distributed tracking algorithm based on momentum policy gradient (MPG) is developed, which can effectively achieve consensus of MASs with guaranteed ‐bounded synchronization error. (2) An actor‐critic‐disturbance neural network is employed to implement the MPG algorithm and obtain optimal policies. Finally, numerical and practical simulation results are conducted to verify the effectiveness of the developed tracking policies via MPG algorithm.