Breast Cancer is considered as the most common cancer in females with high incidence rate. The evolution of modern facilities has helped in reducing the mortality rate, yet the incidence is still the highest among all cancers affecting women. Early diagnosis is a predominant factor for survival. Hence techniques to assist the current modalities are essential. Machine learning techniques have been used so as to produce better prediction and classification models which will aid in better and earlier disease diagnosis and classification. Random Forest is a supervised machine learning classifier that helps in better classification. Random Forests are applied to the Wisconsin breast cancer dataset and the performance of the classifier is evaluated for breast cancer classification. Here in this study an improvised random forest model which uses a cost sensitive learning approach for classification is proposed and it is found to have a better performance than the traditional random forest approach. The model gave an accuracy of 97.51%.
One of the leading death-causing cancers in women is Breast Cancer. Accurate, precise, and early diagnosis is a crucial solution to survival. Data mining techniques have proved to produce good results in disease diagnosis. Feature search techniques are useful in identifying the relevant features for classification thus reducing time and effort. Class inequality is a significant challenge and one of the methods to overcome it is class balancing. In certain cases, the negative class is the majority class. To be specific; the negative class has a more number of instances than the positive class, so the overall classifier performance may be high; consequently, the classifier performance in accurately identifying positive instances gets overlooked. In this paper, a combination of two class balancing approaches is applied. It is used to balance the number of instances in each of the target classes. k-Nearest Neighbour classifier is a simple, easy to implement, and robust classifier with few parameters needed to be tuned. In this paper, we propose a k-Nearest Neighbour Classifier model implemented with feature search using Cuckoo search and Class balancing to classify Breast Cancer. The proposed model produced an accuracy of 99.41 %., ROC of 0.999, and MCC of 0.988.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.