In this paper, we study the resource slicing problem in a dynamic multiplexing scenario of two distinct 5G services, namely Ultra-Reliable Low Latency Communications (URLLC) and enhanced Mobile BroadBand (eMBB). While eMBB services focus on high data rates, URLLC is very strict in terms of latency and reliability. In view of this, the resource slicing problem is formulated as an optimization problem that aims at maximizing the eMBB data rate subject to a URLLC reliability constraint, while considering the variance of the eMBB data rate to reduce the impact of immediately scheduled URLLC traffic on the eMBB reliability. To solve the formulated problem, an optimizationaided Deep Reinforcement Learning (DRL) based framework is proposed, including: 1) eMBB resource allocation phase, and 2) URLLC scheduling phase. In the first phase, the optimization problem is decomposed into three subproblems and then each subproblem is transformed into a convex form to obtain an approximate resource allocation solution. In the second phase, a DRL-based algorithm is proposed to intelligently distribute the incoming URLLC traffic among eMBB users. Simulation results show that our proposed approach can satisfy the stringent URLLC reliability while keeping the eMBB reliability higher than 90%.