Stemming is one of the most effective techniques, which has been adopted in many applications, such as machine learning, machine translation, document classification (DC), information retrieval, and natural language processing. The stemming technique is meant to be applied during the classification of documents to reduce the high dimensionality of the feature space, which, in turn, raises the functioning of the classification system, particularly with extreme modulated language, for instance, Arabic language. This paper aims to study the impact of stemming techniques, namely Information Science Research Institute (ISRI), Tashaphyne, and ARLStem on Arabic DC. The classification algorithms, namely Naïve Bayesian (NB), support vector machine (SVM), and K-nearest neighbors (KNN), are used in this paper. In addition, the chi-square feature selection is used to select the most relevant features. Experiments are conducted on CNN Arabic corpus, which is collected from Arabic websites to assess the performance of the classification system. In order to evaluate the classifiers, the K-fold cross-validation method and Micro-F1 are used. Findings of this paper indicate that the ARLStem outperforms the ISRI and Tashaphyne stemmers. The outcomes clearly showed the effectiveness of the SVM over the KNN and NB classifiers, which achieved 94.64% Micro-F1 value when using the ARLStem stemmer. INDEX TERMS Arabic text classification, text preprocessing, stemming techniques, feature extraction, feature selection.
Human motion detection and activity recognition are becoming vital for the applications in smart homes. Traditional Human Activity Recognition (HAR) mechanisms use special devices to track human motions, such as cameras (vision-based) and various types of sensors (sensor-based). These mechanisms are applied in different applications, such as home security, Human–Computer Interaction (HCI), gaming, and healthcare. However, traditional HAR methods require heavy installation, and can only work under strict conditions. Recently, wireless signals have been utilized to track human motion and HAR in indoor environments. The motion of an object in the test environment causes fluctuations and changes in the Wi-Fi signal reflections at the receiver, which result in variations in received signals. These fluctuations can be used to track object (i.e., a human) motion in indoor environments. This phenomenon can be improved and leveraged in the future to improve the internet of things (IoT) and smart home devices. The main Wi-Fi sensing methods can be broadly categorized as Received Signal Strength Indicator (RSSI), Wi-Fi radar (by using Software Defined Radio (SDR)) and Channel State Information (CSI). CSI and RSSI can be considered as device-free mechanisms because they do not require cumbersome installation, whereas the Wi-Fi radar mechanism requires special devices (i.e., Universal Software Radio Peripheral (USRP)). Recent studies demonstrate that CSI outperforms RSSI in sensing accuracy due to its stability and rich information. This paper presents a comprehensive survey of recent advances in the CSI-based sensing mechanism and illustrates the drawbacks, discusses challenges, and presents some suggestions for the future of device-free sensing technology.
We propose a novel text classification model, which aims to improve the performance of Arabic text classification using machine learning techniques. One of the effective solutions in Arabic text classification is to find the suitable feature selection method with an optimal number of features alongside the classifier. Although several text classification methods have been proposed for the Arabic language using different techniques, such as feature selection methods, an ensemble of classifiers, and discriminative features, choosing the optimal method becomes an NP-hard problem considering the huge search space. Therefore, we propose a method, called Optimal Configuration Determination for Arabic text Classification (OCATC), which utilized the Particle Swarm Optimization (PSO) algorithm to find the optimal solution (configuration) from this space. The proposed OCATC method extracts and converts the features from the textual documents into a numerical vector using the Term Frequency-Inverse Document Frequency (TF–IDF) approach. Finally, the PSO selects the best architecture from a set of classifiers to feature selection methods with an optimal number of features. Extensive experiments were carried out to evaluate the performance of the OCATC method using six datasets, including five publicly available datasets and our proposed dataset. The results obtained demonstrate the superiority of OCATC over individual classifiers and other state-of-the-art methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.