Liver cancer data always consist of a large number of multidimensional datasets. A dataset that has huge features and multiple classes may be irrelevant to the pattern classification in machine learning. Hence, feature selection improves the performance of the classification model to achieve maximum classification accuracy. The aims of the present study were to find the best feature subset and to evaluate the classification performance of the predictive model. This paper proposed a hybrid feature selection approach by combining information gain and sequential forward selection based on the class-dependent technique (IGSFS-CD) for the liver cancer classification model. Two different classifiers (decision tree and naïve Bayes) were used to evaluate feature subsets. The liver cancer datasets were obtained from the Cancer Hospital Thailand database. Three ensemble methods (ensemble classifiers, bagging, and AdaBoost) were applied to improve the performance of classification. The IGSFS-CD method provided good accuracy of 78.36% (sensitivity 0.7841 and specificity 0.9159) on LC_dataset-1. In addition, LC_dataset II delivered the best performance with an accuracy of 84.82% (sensitivity 0.8481 and specificity 0.9437). The IGSFS-CD method achieved better classification performance compared to the class-independent method. Furthermore, the best feature subset selection could help reduce the complexity of the predictive model.
The classification of real-world problems always consists of imbalanced and multiclass datasets. A dataset having unbalanced and multiple classes will have an impact on the pattern of the classification model and the classification accuracy, which will be decreased. Hence, oversampling method keeps the class of dataset balanced and avoids the overfitting problem. The purposes of the study were to handle multiclass imbalanced datasets and to improve the effectiveness of the classification model. This study proposed a hybrid method by combining the Synthetic Minority Oversampling Technique (SMOTE) and One-Versus-All (OVA) with deep learning and ensemble classifiers; stacking and random forest algorithms for multiclass imbalanced data handling. Datasets consisting of different numbers of classes and imbalances are gained from the UCI Machine Learning Repository. The research outputs illustrated that the presented method acquired the best accuracy value at 98.51% when the deep learning classifier was used to evaluate model classification performance in the new-thyroid dataset. The proposed method using the stacking algorithm received a higher accuracy rate than other methods in the car, pageblocks, and Ecoli datasets. In addition, the outputs gained the highest performance of classification at 98.47% in the dermatology dataset where the random forest is used as a classifier.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.