ABSTRACT:Neural Networks (NN) have been used by many researchers to solve problems in several domains including classification and pattern recognition, and Backpropagation (BP) which is one of the most well-known artificial neural network models. Constructing effective NN applications relies on some characteristics such as the network topology, learning parameter, and normalization approaches for the input and the output vectors. The Input and the output vectors for BP need to be normalized properly in order to achieve the best performance of the network. This paper applies several normalization methods on several UCI datasets and comparing between them to find the best normalization method that works better with BP. Norm, Decimal scaling, Mean-Man, Median-Mad, Min-Max, and Z-score normalization are considered in this study. The comparative study shows that the performance of Mean-Mad and Median-Mad is better than the all remaining methods. On the other hand, the worst result is produced with Norm method.
Considering the high dimensionality of gene expression datasets, selecting informative genes is key to improving classification performance. The outcomes of data classification, on the other hand, are affected by data splitting strategies for the training-testing task. In light of the above facts, this paper aims to investigate the impact of three different data splitting methods on the performance of eight well-known classifiers when paired by Cuttlefish algorithm (CFA) as a Gene-Selection. The classification algorithms included in this study are K-Nearest Neighbors (KNN), Logistic Regression (LR), Gaussian Naive Bayes (GNB), Linear Support Vector Machine (SVM-L), Sigmoid Support Vector Machine (SVM-S), Random Forest (RF), Decision Tree (DT), and Linear Discriminant Analysis (LDA). Whereas the tested data splitting methods are cross-validation (CV), train-test (TT), and train-validation-test (TVT). The efficacy of the investigated classifiers was evaluated on nine cancer gene expression datasets using various evaluation metrics, such as accuracy, F1-score, Friedman test. Experimental results revealed that LDA and SVM-L outperformed other algorithms in general. In contrast, the RF and DT algorithms provided the worst results. In most often used datasets, the results of all algorithms demonstrated that the train-test method of data separation is more accurate than the train-validation-test method, while the cross-validation method was superior to both. Furthermore, RF and GNB was affected by data splitting techniques less than other classifiers, whereas the LDA was the most affected one.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.