Rare class imbalance problems, which involve the classification of minority or rare class, are difficult, because the size of the rare class is smaller than the majority class. Since majority class prediction is easy, its accuracy seems to be also high. However, the minority classes cannot be accurately predicted, and for this reason, when the prediction model performance is evaluated by considering only the accuracy, it does not indicate whether the model can predict the minority classes. Therefore, a rare class prediction technique is required. In this study, a rare class prediction model is proposed for minority class prediction. In addition, a dataset of a semiconductor manufacturing process with class imbalance problems was used to create a fault detection model. This prediction model uses data preprocessing to build the characteristics and data set required by the rare classes. To distinguish the rare classes related to the required characteristics, we used standard deviation and Euclidean distance to perform the feature selection. In addition, a particle swarm optimization-deep belief network was applied to create a classifier. The model proposed in this research presents outstanding performance and is appropriate for highly class imbalance problems.
KEYWORDSclass imbalance problem, deep belief network, feature selection, particle swarm optimization, rare class classification
INTRODUCTIONBecause of the issues with dig data and the development of deep learning techniques, the methods for building prediction models are in the spotlight. 1,2 Many AI-based prediction models, which use machine learning, data mining, databases, and statistical methods, are being proposed. Such prediction models based on state-of-the-art techniques are being applied in many fields, and there is a progressive increase in their industrial value. 3,4 For us to implement the prediction models accurately, it is necessary to analyze both domain knowledge and data. In addition, there is an increase in demand for obtaining useful knowledge from the collected data, and therefore, active research is being conducted on prediction models that are suitable for specific domains. 5,6 Thus, the importance of classification prediction techniques for class imbalance problems including class distribution, which is 1 of the main issues in the field of data mining, is increasing. 7-9 When the classes are balanced (balanced class), the ratios of the classes to be predicted are evenly distributed.Thus, by learning the data, a balanced predictive model that can predict all the classes can be generated. In the imbalance problem, the ratio of the category to be predicted is different. In this case, a classification prediction model that can predict only a specific class (rare class or majority class) is generated. For example, in the semiconductor manufacturing process, although most of the produced wafers are regular products, there is small probability for the production of irregular products. Therefore, a rare class prediction method is required to pre...