The integration of mobile robots in material handling in flexible manufacturing systems is made possible by the recent advancements in Industry 4.0 and industrial artificial intelligence. However, effectively scheduling these robots in real-time remains a challenge due to the constantly changing, complex and uncertain nature of the shop floor environment. Therefore, this paper studies the robot scheduling problem for a multiproduct flexible production line using a mobile robot for loading/unloading parts among machines and buffers. The problem is formulated as a Markov Decision Process and the Q-learning algorithm is used to find an optimal policy for the robot's movements in handling different product types. The performance of the system is evaluated using a reward function based on permanent production loss and the cost of demand dissatisfaction. The proposed approach is validated through a numerical case study that compares the resulting policy to a first-come-first-served policy, showing a significant improvement in production throughput of approximately 23%.