Linear Discriminant Analysis (LDA) is a very common technique for dimensionality reduction problems as a preprocessing step for machine learning and pattern classification applications. At the same time, it is usually used as a black box, but (sometimes) not well understood. The aim of this paper is to build a solid intuition for what is LDA, and how LDA works, thus enabling readers of all levels be able to get a better understanding of the LDA and to know how to apply this technique in different applications. The paper first gave the basic definitions and steps of how LDA technique works supported with visual explanations of these steps. Moreover, the two methods of computing the LDA space, i.e. class-dependent and class-independent methods, were explained in details. Then, in a step-by-step approach, two numerical examples are demonstrated to show how the LDA space can be calculated in case of the class-dependent and class-independent methods. Furthermore, two of the most common LDA problems (i.e. Small Sample Size (SSS) and non-linearity problems) were highlighted and illustrated, and state-of-the-art solutions to these problems were investigated and explained. Finally, a number of experiments was conducted with different datasets to (1) investigate the effect of the eigenvectors that used in the LDA space on the robustness of the extracted feature for the classification accuracy, and (2) to show when the SSS problem occurs and how it can be addressed.
Diagnosis is a critical preventive step in Coronavirus research which has similar manifestations with other types of pneumonia. CT scans and X-rays play an important role in that direction. However, processing chest CT images and using them to accurately diagnose COVID-19 is a computationally expensive task. Machine Learning techniques have the potential to overcome this challenge. This paper proposes two optimization algorithms for feature selection and classification of COVID-19. The proposed framework has three cascaded phases. Firstly, the features are extracted from the CT scans using a Convolutional Neural Network (CNN) named AlexNet. Secondly, a proposed features selection algorithm, Guided Whale Optimization Algorithm (Guided WOA) based on Stochastic Fractal Search (SFS), is then applied followed by balancing the selected features. Finally, a proposed voting classifier, Guided WOA based on Particle Swarm Optimization (PSO), aggregates different classifiers' predictions to choose the most voted class. This increases the chance that individual classifiers, e.g. Support Vector Machine (SVM), Neural Networks (NN), k-Nearest Neighbor (KNN), and Decision Trees (DT), to show significant discrepancies. Two datasets are used to test the proposed model: CT images containing clinical findings of positive COVID-19 and CT images negative COVID-19. The proposed feature selection algorithm (SFS-Guided WOA) is compared with other optimization algorithms widely used in recent literature to validate its efficiency. The proposed voting classifier (PSO-Guided-WOA) achieved AUC (area under the curve) of 0.995 that is superior to other voting classifiers in terms of performance metrics. Wilcoxon rank-sum, ANOVA, and T-test statistical tests are applied to statistically assess the quality of the proposed algorithms as well.
Grey Wolf Optimizer (GWO) simulates the grey wolves' nature in leadership and hunting manners. GWO showed a good performance in the literature as a meta-heuristic algorithm for feature selection problems, however, it shows low precision and slow convergence. This paper proposes a Modified Binary GWO (MbGWO) based on Stochastic Fractal Search (SFS) to identify the main features by achieving the exploration and exploitation balance. First, the modified GWO is developed by applying an exponential form for the number of iterations of the original GWO to increase the search space accordingly exploitation and the crossover/mutation operations to increase the diversity of the population to enhance exploitation capability. Then, the diffusion procedure of SFS is applied for the best solution of the modified GWO by using the Gaussian distribution method for random walk in a growth process. The continuous values of the proposed algorithm are then converted into binary values so that it can be used for the problem of feature selection. To ensure the stability and robustness of the proposed MbGWO-SFS algorithm, nineteen datasets from the UCI machine learning repository are tested. The K-Nearest Neighbor (KNN) is used for classification tasks to measure the quality of the selected subset of features. The results, compared to binary versions of the-state-of-the-art optimization techniques such as the original GWO, SFS, Particle Swarm Optimization (PSO), hybrid of PSO and GWO, Satin Bowerbird Optimizer (SBO), Whale Optimization Algorithm (WOA), Multiverse Optimization (MVO), Firefly Algorithm (FA), and Genetic Algorithm (GA), show the superiority of the proposed algorithm. The statistical analysis by Wilcoxon's rank-sum test is done at the 0.05 significance level to verify that the proposed algorithm can work significantly better than its competitors in a statistical way.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.