Machine-to-Machine (M2M) communication is a promising technology that may realize the Internet of Things (IoTs) in future networks. However, due to the features of massive devices and concurrent access requirement, it will cause performance degradation and enormous energy consumption. Energy Harvesting-Powered Cognitive M2M Networks (EH-CMNs) as an attractive solution is capable of alleviating the escalating spectrum deficient to guarantee the Quality of Service (QoS) meanwhile decreasing the energy consumption to achieve Green Communication (GC) became an important research topic. In this paper, we investigate the resource allocation problem for EH-CMNs underlaying cellular uplinks. We aim to maximize the energy efficiency of EH-CMNs with consideration of the QoS of Human-to-Human (H2H) networks and the available energy in EHdevices. In view of the characteristic of EH-CMNs, we formulate the problem to be a decentralized Discrete-time and Finite-state Markov Decision Process (DFMDP), in which each device acts as agent and effectively learns from the environment to make allocation decision without the complete and global network information. Owing to the complexity of the problem, we propose a Deep Reinforcement Learning (DRL)based algorithm to solve the problem. Numerical results validate that the proposed scheme outperforms other schemes in terms of average energy efficiency with an acceptable convergence speed.