Breast cancer is the most common cancer among women due to many factors such as heredity and unhealthy lifestyles. Early and accurate diagnosis of this cancer improves the patient quality of life and increases the survival rate. Microarray technology provides effective way to early diagnosis cancer. However, the nature of its data complicates the classification process. A hybrid approach of mutual information (MI), least absolute shrinkage and selection operator (LASSO) and genetic algorithm (GA) is proposed to face this challenge. The proposed approach is assessed using logistic regression (LR), support vector machine (SVM), K-nearest neighbor (KNN), and random forest (RF). Compared to the state-of-the-art models, the proposed approach can effectively diagnose the breast cancer with few numbers of genes. For the benchmarked Van't veer dataset, it obtained a classification accuracy of 96% with only 23 features.
KeywordsCancer classification • Microarray data • Genetic algorithm • LASSO • Mutual information 1 Introduction Breast cancer is the most commonly occurring cancer in women and the second most common cancer overall. Early detection and diagnosis have been proven to significantly improve patient survival rates [1], quality of life, as well as to significantly reduce the cost and complexity of cancer treatment. Microarray technology has become benchmark technique for early cancer diagnosis [2]. It monitors each gene many times under different conditions or alternately evaluates each gene in M. Abd-elnaby (B) • M.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.