The development of new technologies and the improvement of systems made solar photovoltaic (PV) energy generation grows exponentially in the last years. Research into failure and degradation mechanisms has become necessary for obtaining efficient and reliable systems. However, unsolved challenges remain concerning safety, unforeseen outages, and high operation and maintenance (O&M) costs. Early detection of problems is essential to provide reliability and avoid production losses over time. Performing preventive maintenance can anticipate faults and limit unplanned downtime as it is based on history and probability of failure. This work aims to increase PV plants' operational performance by improving the methodologies for O&M in PV systems. A Markov Decision Problem model is adapted to a Reinforcement Learning approach to recommend preventive maintenance actions in PV systems considering equipment degradation, such as cables and switches in the inverter. The methodology allows to explore the economic and energy production benefits of the detection, prevention, and mitigation techniques applied to PV power production. Case studies consider large-scale scenarios and show that the approach can be applied to create a long-term horizon planning maintenance policy.