The installation of large-scale solar (LSS) photovoltaic (PV) power plants continues to rise globally as well as in Malaysia. The data provided by LSS PV consist of five weather stations with seven parameters, a 22-unit inverter, and 1-unit PQM Meter Grid as a big dataset. These big data are rapidly changing every minute, they lack data quality when missing data, and need to be analyzed for a longer duration to leverage their benefits to prevent misleading information. This paper proposed the forecasting power LSS PV using decision tree regression from three types of input data. Case 1 used all 35 parameters from five weather stations. For Case 2, only seven parameters were used by calculating the mean of five weather stations. While Case 3 was chosen from an index correlation of more than 0.8. The analysis of the historical data was carried out from June 2019 until December 2020. Moreover, the mean absolute error (MAE) was also calculated. A reliability test using the Pearson correlation coefficient (r) and coefficient of determination (R2) was done upon comparing with actual historical data. As a result, Case 2 was proposed to be the best input dataset for the forecasting algorithm.