Device-to-device (D2D) communication is an emerging technology in the evolution of the 5G network enabled vehicle-to-vehicle (V2V) communications. It is a core technique for the next generation of many platforms and applications, e.g. real-time high-quality video streaming, virtual reality game, and smart city operation. However, the rapid proliferation of user devices and sensors leads to the need for more efficient resource allocation algorithms to enhance network performance while still capable of guaranteeing the quality-of-service. Currently, deep reinforcement learning is rising as a powerful tool to enable each node in the network to have a real-time self-organising ability. In this paper, we present two novel approaches based on deep deterministic policy gradient algorithm, namely ''distributed deep deterministic policy gradient'' and ''sharing deep deterministic policy gradient'', for the multi-agent power allocation problem in D2D-based V2V communications. Numerical results show that our proposed models outperform other deep reinforcement learning approaches in terms of the network's energy efficiency and flexibility. INDEX TERMS Non-cooperative D2D communication, D2D-based V2V communications, power allocation, multi-agent deep reinforcement learning, and deep deterministic policy gradient (DDPG).