This study applies an effective methodology based on Reinforcement Learning (RL) to a control system. Using the Pound-Drever-Hall locking scheme, we match the wavelength of a controlled laser to the length of a Fabry-Pérot cavity such that the cavity length is an exact integer multiple of the laser wavelength. Typically, long-term drift of the cavity length and laser wavelength exceeds the dynamic range of this control if only the laser's piezoelectric transducer is actuated, so the same error signal also controls the temperature of the laser crystal. In this work, we instead implement this feedback control grounded on Q-Learning. Our system learns in real-time, eschewing reliance on historical data, and exhibits adaptability to system variations post-training. This adaptive quality ensures continuous updates to the learning agent. This innovative approach maintains lock for eight days on average.