Modern power system integrates more and more new energy and use a large number of power electronic equipment. This makes it face more challenges in online optimization and real-time control. Deep reinforcement learning(DRL) has the ability of processing big data and high-dimensional features, as well as the ability of independently learning and optimizing decision-making in complex environments. In this paper, we explore DRL based online combination optimization method of grid section for large complex power system. In our method, to improve the convergence speed of the model, we propose to discretize the output action of the unit and simplify the action space. We also design a reinforcement learning loss function with strong constraints to further improve the convergence speed of the model and facilitate the algorithm to obtain the stable solution. Moreover, to avoid the local optimal solution problem caused by the discretization of the output action, we propose to use the annealing optimization algorithm to make the granularity of the unit output finer. We verify our method on IEEE 118-bus system. The experimental results show that our model has fast convergence speed and better performance, and can obtain stable solutions.