Recently, the growing demand of various emerging applications in the realms of sixth-generation (6G) wireless networks has made the term internet of Things (IoT) very popular. Device-to-device (D2D) communication has emerged as one of the significant enablers for the 6G-based IoT network. Recently, the intelligent reflecting surface (IRS) has been considered as a hardware-efficient innovative scheme for future wireless networks due to its ability to mitigate propagation-induced impairments and to realize a smart radio environment. Such an IRS-assisted D2D underlay cellular network is investigated in this paper. Our aim is to maximize the network’s spectrum efficiency (SE) by jointly optimizing the transmit power of both the cellular users (CUs) and the D2D pairs, the resource reuse indicators, and the IRS reflection coefficients. Instead of using traditional optimization solution schemes to solve this mixed integer nonlinear optimization problem, a reinforcement learning (RL) approach is used in this paper. The IRS-assisted D2D communication network is structured by the Markov Decision Process (MDP) in the RL framework. First, a Q-learning-based solution is studied. Then, to make a scalable solution with large dimension state and action spaces, a deep Q-learning-based solution scheme using experience replay is proposed. Lastly, an actor-critic framework based on the deep deterministic policy gradient (DDPG) scheme is proposed to learn the optimal policy of the constructed optimization problem considering continuous-valued state and action spaces. Simulation outcomes reveal that the proposed RL-based solution schemes can provide significant SE enhancements compared to the existing optimization schemes.