Energy harvesting technologies offer a promising solution to sustainably power an ever-growing number of Internet of Things (IoT) devices. However, due to the weak and transient natures of energy harvesting, IoT devices have to work intermittently rendering conventional routing policies and energy allocation strategies impractical. To this end, this paper, for the very first time, developed a distributed multi-agent reinforcement algorithm known as global actor-critic policy (GAP) to address the problem of routing policy and energy allocation together for the energy harvesting powered IoT system. At the training stage, each IoT device is treated as an agent and one universal model is trained for all agents to save computing resources. At the inference stage, packet delivery rate can be maximized. The experimental results show that the proposed GAP algorithm achieves ∼ 1.28× and ∼ 1.24× data transmission rate than that of the Q-table and ESDSRAA algorithm, respectively.