The CO2 flooding with superior displacement efficiency and high injectivity is an efficient enhanced oil recovery method. However, due to the unfavorable sweep efficiency particularly for strong heterogeneous reservoirs and immiscible flooding, the oil recovery on site is not all favorable. Multi-well rates optimization is one of common measures improving sweep efficiency with easy implement and low cost There are many rates optimization methods have been proposed by now. In this research, we first introduced the multi-agent deep deterministic policy gradient (MADDPG) algorithm to the multi-well rates optimization of CO2 flooding, and the new rates optimization method was built. The MADDPG adopts the centralized training and decentralized execution algorithm framework, and overcomes the defect that the single-agent reinforcement learning cannot deal the multi-well rates optimization well and also avoids the dimensional disaster problems. We treated each well as an agent, and each agent has its own reward, state and action. We chose the net present value (NPV) as the reward, the injection-production rate change range as the action element, and the production time, the bottom hole pressure, the oil production rate, and the gas-oil ratio as the state elements. The simulation results show that the optimal case obviously improves the NPV compared with the base case, and the simulation case with strong heterogeneity and immiscible flooding can also converge to the optimal target, which prove the effectiveness and robustness of the rates optimization method respectively. This research provides recommendations that improving the oil recovery by increasing the sweep efficiency to increase the income, and reducing the invalid CO2 injection to decrease the cost can achieve the optimal NPV. Reservoir heterogeneity seriously impairs the rates optimization performance, and rates optimization makes little difference to extreme strong interlayer heterogeneity for serious interlayer influence.