As the requirements for the optimal control of building systems increase, the accuracy and speed of load predictions should also increase. However, the accuracy of load predictions is related to not only the prediction algorithm, but also the feature set construction. Therefore, this study develops a shortterm building cooling load prediction model based on feature set construction. The impacts of four different feature set construction methods-feature extraction, correlation analysis, K-means clustering, and discrete wavelet transform (DWT)-on the prediction accuracy are compared. To ensure that the effect of the feature set construction method is universal, three different prediction algorithms are used. The influences of the sample dimension and prediction time horizon on the prediction accuracy are also analysed. The prediction model is developed based on an ensemble learning algorithm utilising the cubist algorithm, and the performance of the prediction model is improved when DWT is used for constructing the feature set. Compared with other commonly used prediction models, the proposed model exhibits the best performance, with R-squared and CV-RMSE values of 99.8% and 1.5%, respectively. INDEX TERMS Cooling load prediction, Feature extraction, Ensemble learning algorithms, Discrete wavelet transform Nomenclature RF Random forest R-squared Square correlation coefficient GBM Gradient boosting machine FOG First order 5-h gradient DWT Discrete wavelet transform MSG Mean value of the square of the 1-h gradients in the past 5 h CA Correlation analysis FS Feature sets PCA Principal component analysis Large sample FS1 t-SNE t-distributed stochastic neighbour embedding Small sample FS10 g Principal component contribution ratio ARE Absolute value of relative error between predicted and actual values rspearman Spearman rank correlation coefficient IR Degree of improvement in the prediction accuracy Dk Sum of intra-cluster distances At Actual value SSE Sum of the squared error Ft Predicted value MSE Mean square error n Sample size CV-RMSE Root-mean-square error coefficient of variation Reduction ratio Reduction values between two time points