Data integrity is crucial for the performance and reliability analysis of photovoltaic (PV) systems, since actual in‐field measurements commonly exhibit invalid data caused by outages and component failures. The scope of this paper is to present a complete methodology for PV data processing and quality verification in order to ensure improved PV performance and reliability analyses. Data quality routines (DQRs) were developed to ensure data fidelity by detecting and reconstructing invalid data through a sequence of filtering stages and inference techniques. The obtained results verified that PV performance and reliability analyses are sensitive to the fidelity of data and, therefore, time series reconstruction should be handled appropriately. To mitigate the bias effects of 10% or less invalid data, the listwise deletion technique provided accurate results for performance analytics (exhibited a maximum absolute percentage error of 0.92%). When missing data rates exceed 10%, data inference techniques yield more accurate results. The evaluation of missing power measurements demonstrated that time series reconstruction by applying the Sandia PV Array Performance Model yielded the lowest error among the investigated data inference techniques for PV performance analysis, with an absolute percentage error less than 0.71%, even at 40% missing data rate levels. The verification of the routines was performed on historical datasets from two different locations (desert and steppe climates). The proposed methodology provides a set of standardized analytical procedures to ensure the validity of performance and reliability evaluations that are performed over the lifetime of PV systems.
A main challenge for integrating the intermittent photovoltaic (PV) power generation remains the accuracy of day-ahead forecasts and the establishment of robust performing methods. The purpose of this work is to address these technological challenges by evaluating the day-ahead PV production forecasting performance of different machine learning models under different supervised learning regimes and minimal input features. Specifically, the day-ahead forecasting capability of Bayesian neural network (BNN), support vector regression (SVR), and regression tree (RT) models was investigated by employing the same dataset for training and performance verification, thus enabling a valid comparison. The training regime analysis demonstrated that the performance of the investigated models was strongly dependent on the timeframe of the train set, training data sequence, and application of irradiance condition filters. Furthermore, accurate results were obtained utilizing only the measured power output and other calculated parameters for training. Consequently, useful information is provided for establishing a robust day-ahead forecasting methodology that utilizes calculated input parameters and an optimal supervised learning approach. Finally, the obtained results demonstrated that the optimally constructed BNN outperformed all other machine learning models achieving forecasting accuracies lower than 5%.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.