<abstract><p>To overcome the two class imbalance problem among breast cancer diagnosis, a hybrid method by combining principal component analysis (PCA) and boosted C5.0 decision tree algorithm with penalty factor is proposed to address this issue. PCA is used to reduce the dimension of feature subset. The boosted C5.0 decision tree algorithm is utilized as an ensemble classifier for classification. Penalty factor is used to optimize the classification result. To demonstrate the efficiency of the proposed method, it is implemented on biased-representative breast cancer datasets from the University of California Irvine(UCI) machine learning repository. Given the experimental results and further analysis, our proposal is a promising method for breast cancer and can be used as an alternative method in class imbalance learning. Indeed, we observe that the feature extraction process has helped us improve diagnostic accuracy. We also demonstrate that the extracted features considering breast cancer issues are essential to high diagnostic accuracy.</p></abstract>
To surmount the two-class imbalanced problem existing in the breast cancer diagnosis, a hybrid method of ROSE sampling approach with Boosted C5.0 ensemble classifier (R-Boosted C5.0) is proposed. ROSE as the sampling method is utilized to balance the class distribution. Boosted C5.0
is then used as the classifier. To serve this purpose, Wisconsin Breast Cancer Dataset (WBCD), Wisconsin Diagnosis Breast Cancer (WDBC) and three imbalanced datasets have been studied. Assessing by Matthews Correlation Coefficient (MCC), the performance of proposed method on WBCD and WDBC
datasets were 98.5% and 93.0%, respectively. The experimental results show that the proposed work outperforms in contrast with the rest of the classifiers. It can be used as a clinical decision support system to assist breast cancer prediction. In practice, the proposed methodology can be
further applied to class imbalanced data classification.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.