The study applies the Markov game to grid reactive power regulation based on deep reinforcement learning theory, constructs the Markov game for grid optimization problems, and optimizes it using the HAPPO algorithm to explore real-time grid optimization strategy based on multi-intelligence body reinforcement learning. On the basis of the optimization strategy, the grid power management method based on deep reinforcement learning is explored through the Markov decision process and the improved deep deterministic policy gradient method, and the grid operation optimization model based on deep reinforcement learning is constructed. The model is then examined in terms of arithmetic cases. The maximum error of the model in this paper is less than 5%, and the accuracy of the fitting is high. The node voltage has a maximum voltage offset of 0.0025, resulting in high voltage quality. The real-time optimization solves for an average voltage offset that is 97.9% lower and a maximum voltage offset that is 75.4% lower compared to the long-term scale reactive power optimization. The average running cost and standard deviation of the model increase with greater communication impairment. The model approach in this paper performs the best in terms of optimization cost, reducing it by 1.12%, 6.67%, 10.93%, and 0.94% compared to the other four approaches.