Underlay Cognitive Radio (CR) systems were introduced to resolve the issue of spectrum scarcity in wireless communication. In CR systems, an unlicensed Secondary Transmitter (ST) shares the channel with a licensed Primary Transmitter (PT). Spectral efficiency of the CR systems can be further increased if multiple STs share the same channel. In underlay CR systems, the STs are required to keep interference at a low level to avoid outage at the primary system. The restriction on interference in underlay CR prevents some STs from transmitting while other STs may achieve high data rates, thus making the underlay CR network unfair. In this work, we consider the problem of achieving fairness in the rates of the STs. The considered optimization problem is non-convex in nature. The conventional iteration-based optimizers are time-consuming and may not converge when the considered problem is non-convex. To deal with the problem, we propose a deep-Q reinforcement learning (DQ-RL) framework that employs two separate deep neural networks for the computation and estimation of the Q-values which provides a fast solution and is robust to channel dynamic. The proposed technique achieves near optimal values of fairness while offering primary outage probability of less than 4%. Further, increasing the number of STs results in a linear increase in the computational complexity of the proposed framework. A comparison of several variants of the proposed scheme with the optimal solution is also presented. Finally, we present a novel cumulative reward framework and discuss how the combined-reward approach improves the performance of the communication system.