With the continuous improvement of new energy penetration in the power system, the price of the spot market of power frequently fluctuates greatly, which damages the income of a large number of thermal power enterprises. In order to lock in the profit, thermal power enterprises should turn the main target of profit to the medium and long-term power market. With the continuous advancement of the reform in China's power system, major changes have taken place in the medium and long-term power transactions, including the transaction target, organization method, clearing method and so on, so it is urgent to explore the quotation strategy of thermal power enterprises under the medium and long term market changes. Based on the theory of game equilibrium, this paper establishes non-cooperative game and cooperative game models between thermal power companies. Considering that the traditional reinforcement learning method is difficult to solve the multi-agent incomplete information game model, this paper uses the Multi-Agent Deep Deterministic Policy Gradient(MADDPG) algorithm to solve the above model. Finally, the validity of the proposed model is proved by a numerical example. The results show that, compared with other reinforcement learning algorithms, when solving the multi-agent incomplete information game model, the quotation obtained by MADDPG is more accurate, the revenue is increased by 5.2%, and the convergence time is reduced by 50%.In addition, this paper finds that in the medium and long-term power market, thermal power companies are more inclined to use physical retention methods to make profits. The greater the market power of thermal power companies, the greater the probability of physical retention. When low-cost thermal power companies retain more power, they will increase market clearing electricity prices and harm market efficiency. Regulators should focus on the market behavior of such thermal power companies.