Abstract. Aiming at the problems existing in Multi Agent Q learning algorithm in micro-grid control systems, this article put forward a Q-learning algorithm of Multi-agent in Micro-grid control system based on probability. This algorithm introduces the theory of probability into greedy strategy, so that the optimal action will be chosen. At the same time in order to take the influence of mutual problems between agents and the historical information of every agent to the action selecting problem into consideration, the historical information is added as a parameter during getting the prime function of finding the ideal value. In order to test my algorithm, this article tests the algorithm in emulated micro-grid control system. It appears that , this algorithm can restore the power to stable state rapidly, when the power of the micro-grid changes frequently.