Creating Reinforcement learning(RL) agents that can perform tasks in the real-world robotic systems remains a challenging task due to inconsistencies between the virtual-and the real-world. This is known as the ''reality-gap'' which hinders the performance of a RL agent trained in a virtual environment. The research describes the techniques used to train the models, generate randomized environments, reward function, and techniques utilized to transfer the model to the physical environment for evaluation. For this investigation, a low-cost 3-degrees-of-freedom (DOF) Steward platform was 3D modeled and created virtually and physically. The goal of the 3D-Stewart platform was to guide and balance the marble towards the center. Custom end-to-end APIs were developed to interact with the Godot game engine, manipulate physics and dynamics, interact with the in-game lighting and perform environment randomizations. Two RL algorithms: Q-learning and Actor-Critic, were implemented to evaluate the performance by using domain randomization and induced noise to bridge the reality gap. For Q-learning, raw frames were used to make predictions while Actor-Critic utilized marble position, velocity vector and relative position by preprocessing captured frames. The experimental results show the effectiveness of domain randomization and introduction of noise during the training.