Time interval and time length are two important indexes when analyzing the active output data of photovoltaic (PV) power stations. When the time interval is constant, the length of time is too small, and the included information is less, resulting in a lack and distortion of information; it the length of time is too large, the included information is redundant and complicated, resulting in unnecessary increases of storage capacity and calculation. Therefore, it is important to determine the appropriate length of data for the analysis of PV output data. In this paper, firstly, the output data of a PV power station is analyzed statistically, and the preliminary conclusions for time length selection are obtained by autocorrelation analysis. Based on the weather characteristics, clustering analysis methods and statistical principles are used to analyze the data and optimal sample capacity estimation, respectively, for different types of photovoltaic output data and determine the required data time length at the time of analyzing the PV power plant output data, the relationship between energy storage capacity demand and data length is investigated, the rationality of the length of the selected time is verified. Meanwhile, the energy storage system capacity configuration based on the optimal data time length is given. The results show that the requirement of data volume of energy storage system capacity configuration can be met when the time length of the PV output data is 23 days.