Since the number of healthy people is much more than that of ill people, it is highly likely that the problem of imbalanced data will occur when predicting the depression of the elderly living in the community using big data. When raw data are directly analyzed without using supplementary techniques such as a sample algorithm for datasets, which have imbalanced class ratios, it can decrease the performance of machine learning by causing prediction errors in the analysis process. Therefore, it is necessary to use a data sampling technique for overcoming this imbalanced data issue. As a result, this study tried to identify an effective way for processing imbalanced data to develop ensemble-based machine learning by comparing the performance of sampling methods using the depression data of the elderly living in South Korean communities, which had quite imbalanced class ratios. This study developed a model for predicting the depression of the elderly living in the community using a logistic regression model, gradient boosting machine (GBM), and random forest, and compared the accuracy, sensitivity, and specificity of them to evaluate the prediction performance of them. This study analyzed 4,085 elderly people (≥60 years old) living in the community. The depression data of the elderly in the community used in this study had an unbalance issue: the result of the depression screening test showed that 87.5% of subjects did not have depression, while 12.5% of them had depression. This study used oversampling, undersampling, and SMOTE methods to overcome the unbalance problem of the binary dataset, and the prediction performance (accuracy, sensitivity, and specificity) of each sampling method was compared. The results of this study confirmed that the SMOTE-based random forest algorithm showing the highest accuracy (a sensitivity ≥ 0.6 and a specificity ≥ 0.6) was best prediction performance among random forest, GBM, and logistic regression analysis. Further studies are needed to compare the accuracy of SMOTE, undersampling, and oversampling for imbalanced data with high dimensional y-variables.