Open Radio Access Network (O-RAN) is a novel architecture aiming to disaggregate the network components to reduce capital and operational costs and open the interfaces to ensure interoperability. In this work, we consider the problem of allocating computing resources to process the data of enhanced Mobile BroadBand (eMBB) users and Ultra-Reliable Low-Latency (URLLC) Users. Supposing the processing of users' frames from different base stations is done in a shared O-Cloud, we model the computing resources allocation problem as an Integer Linear Programming (ILP) problem that aims at fairly allocating computing resources to eMBB and URLLC users and optimizing the QoS of URLLC users without neglecting eMBB users. Due to the high complexity of solving an ILP problem, we model the problem using Reinforcement Learning (RL). Our results demonstrate the ability of our RL-based solution to perform close to the ILP solver while having much lower computational complexity. For a different number of Open Radio Units (O-RUs), the objective value of the RL agent does not deviate from the ILP objective by more than 6%.