Device‐to‐device (D2D) communication is a promising development in 5G networks, offering potential benefits such as increased data rates, reduced costs and latency, and improved energy efficiency (EE). This study analyzes the operation of millimeter‐wave (mmWave) in cellular networks. A client's device can establish a connection to either a base station or another client, facilitating D2D communication based on a distance threshold and accounting for interference. The research employs a deep reinforcement learning (DRL)–based resource allocation (RA) scheme for D2D‐enabled mmWave communications underlaying cellular networks. It evaluates the effectiveness of several metrics: coverage probability, area spectral efficiency, and network EE. Among networks limited by noise, the proposed strategy demonstrates the highest coverage probability performance. The paper also suggests an optimization approach based on the firefly algorithm for RA, taking into account the stochastic nature of wireless channels. An asynchronous advantage actor–critic (A3C) DRL algorithm is modeled for this purpose. The performance of the proposed scheme is compared with two existing algorithms: soft actor–critic and proximal policy optimization. Overall, the numerical results indicate that our proposed firefly algorithm–optimized A3C method outperforms the other analytical methods.