Experience-based methods like reinforcement learning (RL) are often deemed less suitable for the safety field due to concerns about potential safety issues. To bridge this gap, we introduce STPA-RL, a methodology that integrates RL with System-Theoretic Process Analysis (STPA). STPA is a safety analysis technique that identifies causative factors leading to unsafe control actions and system hazards through loss scenarios. In the context of STPA-RL, we formalize the Markov Decision Process based on STPA analysis results to incorporate control algorithms into the system environment. The agent learns safe actions through reward-based learning, tracking potential hazard paths to validate system safety. Specifically, by analyzing various loss scenarios related to the Platform Screen Door, we assess the applicability of the proposed approach by evaluating hazard trajectory graphs and hazard frequencies in the system. This paper streamlines the RL process for loss scenario identification through STPA, contributing to self-guided loss scenarios and diverse system modeling. Additionally, it offers effective simulations for proactive development to enhance system safety and provide practical assistance in the safety field.