Wafer yield prediction, as the basis of quality control, is dedicated to predicting quality indices of the wafer manufacturing process. In recent years, data-driven machine learning methods have received a lot of attention due to their accuracy, robustness, and convenience for the prediction of quality indices. However, the existing studies mainly focus on the model level to improve the accuracy of yield prediction does not consider the impact of data characteristics on yield prediction. To tackle the above issues, a novel wafer yield prediction method is proposed, in which the improved genetic algorithm (IGA) is an under-sampling method, which is used to solve the problem of data overlap between finished products and defective products caused by the similarity of manufacturing processes between finished products and defective products in the wafer manufacturing process, and the problem of data imbalance caused by too few defective samples, that is, the problem of uneven distribution of data. In addition, the high-dimensional alternating feature selection method (HAFS) is used to select key influencing processes, that is, key parameters to avoid overfitting in the prediction model caused by many input parameters. Finally, SVM is used to predict the yield. Furthermore, experiments are conducted on a public wafer yield prediction dataset collected from an actual wafer manufacturing system. IGA-HAFS-SVM achieves state-of-art results on this dataset, which confirms the effectiveness of IGA-HAFS-SVM. Additionally, on this dataset, the proposed method improves the AUC score, G-Mean and F1-score by 21.6%, 34.6% and 0.6% respectively compared with the conventional method. Moreover, the experimental results prove the influence of data characteristics on wafer yield prediction.