Applying model-based learning for the optimal decision of the multi-agent system is not trivial due to the expensive price or even the impossibility of obtaining the ground truth for training the model of the complex environment. Such as learning the optimal action of hydraulic supports in the top-coal caving, the optimal action could not accessible as the ground truth of the corresponding state in the intricate processes. Regarding the latent ground truth as the hidden variable is an effective method in the hidden Markov model. This paper extends the hidden variable of ground truth to the multi-agent system and proposes the hidden Markov random field (HMRF) model with reinforcement learning for optimizing the action decision of the multi-agent. In the HMRF model, the input states and the output actions of the multi-agent are considered as an observable random field and a latent Markov random field, respectively. Based on the HMRF model, the optimal decision is inferred by the maximum posterior probability with the prior probability obtained by Q-learning. Meanwhile, the parameters of the HMRF model are estimated by the expectation maximum algorithm. In the experiment, the top-coal caving demonstrates the effectiveness of the proposed method that the recall of top-coal is improved prominently with a very small price of increasing the rock-rate. Furthermore, the proposed method is employed to deal with the predator-preys problem in the gym. The experiment result shows that the communication between agents by the HMRF increases the reward of the preys. INDEX TERMS Hidden Markov random field, optimal decision, multi-agent, top-coal caving.