Generally, photovoltaic (PV) fault detection approaches can be divided into two groups: end-to-end and threshold methods. The end-to-end method typically uses a deep neural network (DNN) to learn fault patterns from labeled datasets, which directly detect whether faults occur or not. The threshold method first estimates power generation and uses thresholds to detect atypical deviations of measured values from estimated ones. The former method heavily relies on fault-labeled data and, therefore, requires the collection of abnormal event records, which is usually difficult, due to the sparseness of these events. The latter method typically uses statistical approaches, such as 3-sigma, to find thresholds, and it can be practically utilized without fault labels. However, setting a threshold with a proper confidence interval is still challenging, as PV power generation is sensitive to variations in environmental conditions, such as irradiance, ambient temperature, wind speed and humidity. In this paper, we propose a novel deep reinforcement learning (DRL)-based label-free fault detection scheme in which thresholds are dynamically assigned with suitable confidence intervals under varying environmental conditions. Various weather properties were used as input features (i.e., states) to a DRL agent, and proper thresholds were estimated in real time from the actions of the DRL agent. To this end, a reward function was designed for learning proper thresholds without fault labels under different weather conditions. To evaluate the performance of the proposed scheme, the PV dataset of the National Institute of Standards and Technology (NIST) was used, as it includes paired records of local weather and PV generations. The DRL-based scheme was compared with static and conventional dynamic threshold methods, based on statistical approaches. The results revealed that the proposed scheme outperformed the existing methods, providing a 5.67% higher F1-score in the NIST dataset.