SummaryOver the past few decades, the number of users and services of the mobile communications system has considerably increased, and since its essential resources such as spectrum and energy are limited, their optimization has drawn particular interest. Concomitantly, artificial intelligence (AI) techniques have advanced and their applications have been expanded, including problems of classification, regression, and optimization of tasks of mobile communications systems. Regarding fifth and sixth generations of such systems, the insertion of AI is foreseen toward the allocation of available resources. The present study applied two recently proposed techniques based on deep reinforcement learning algorithms (viz., deep deterministic policy gradient [DDPG] and twin‐delayed DDPG [TD3]), for the power control and spectrum allocation of a mobile communications system with device‐to‐device (D2D) underlay communications. The results show that both algorithms have superior performance to the three algorithms used for comparison: A random algorithm, a greedy algorithm, and REINFORCE, a classical reinforcement learning algorithm. Furthermore, the results show the proposed algorithms have good generalization capability and performed the allocation intelligently, taking into account the relationship between distances separating devices and interference between communications. The results also proved robust in terms of small variations in input data and noise.