This paper proposes a distributed scheduling model for virtual power plants, and in order to realize the optimal scheduling of multiple virtual power plants, it proposes a reinforcement learning model and establishes a framework of reinforcement learning methods to subdivide the states, actions, and rewards that are generated by the grid scheduling center when it interacts with the power grid. An environment model including multiple virtual power plant operation models is constructed, and the objective function of the control center is set to allocate scheduling commands in the virtual power plants. The algorithmic analysis is based on the actual network framework of the regional power grid and divides the tariffs of power purchased/sold by the VPP to the power market. The centralized optimization method, distributed optimization method, and reinforcement learning optimization method are proposed respectively to solve the MVPP coordinated optimal dispatch model. The optimization objective, deviation control strategy, and carbon trading elements are exemplified, and different scenario scenarios are set to analyze the convergence of the reinforcement learning model (DDPG) and the scheduling results. Reinforcement learning real-time optimal scheduling focuses on the measured values of wind power and loads and avoids the increase of costs or decrease of revenues due to the fluctuation smoothing by the grid by coordinating the internal resources for smoothing or the complementary consumption through the inter-rate of VPP. It can take into account the impact of the current decision on future time, thus achieving scheduling optimization in multiple periods.