Multi-agent cooperation needs to reason about beliefs in the partially observable environment without communication, but the traditional Multi-agent Deep Reinforcement Learning (MADRL) algorithm struggles to handle the uncertainty of agents. Multi-agent Epistemic planning (MEP) tries to let the agent find a best plan to complete the cooperation task, so as to more effectively solve the uncertainty. However, inconsistent planning arises if the MADRL only adds MEP. We propose a MADRL-based policy network architecture called SMM-MEPP: Shared Mental Model - Multi-agent Epistemic Planning Policy. Firstly, Multi-agent Epistemic Planning and MADRL are investigated to build the "Perception-Planning-Action" multi-agent epistemic planning framework. Then, mental model in psychology is introduced and descript as a neural network. Thirdly, parameter sharing mechanism is utilized to achieve the shared mental model and maintain the consistency of epistemic planning. Finally, we apply the SMM-MEPP architecture to three advanced MADRL algorithms (i.e., MAAC, MADDPG and MAPPO) and conduct comparative experiments in multi-agent cooperation tasks. Experiments show that the proposed method can bring consistent planning for multiple agents, and improves convergence speed or training effect in partially observable environment without communication.