Currently, expressing feelings through social media requires great consideration as an essential part of our lives; besides sharing ideas and thoughts, we share moments and good memories. Social media such as Facebook, Twitter, Weibo, and LinkedIn, are considered rich sources of opinionated text data. Both organizations and individuals are interested in using social media to analyze people's opinions and extract sentiments and emotions. Most studies on social media analysis mainly classified sentiment as positive, negative, or neutral classes. The challenge in emotion analysis arises because humans can express one or several emotions within one expression. Human beings can recognize these different emotions well; however, it is still not easy for an emotion analysis system. In most cases, the Arabic language used through social media is of a slangy or colloquial form, making it more challenging to preprocess and filter noise since most lemmatization and stemming tools are built on Modern Standard Arabic (MSA). An emotion analysis model has been implemented to categorize emotions. The model is a multiclass and multilabel classification problem. However, few studies have been adapted for this emotion classification problem in Arabic social media. Nearly the only work is the one of SemEval 2018 task1- sub-task E-c. Several machine learning approaches have been implemented in this task; a few studies were based on deep learning. Our model implemented a novel multilayer bidirectional long short term memory (BiLSTM) trained on top of pre-trained word embedding vectors. The model achieved state-of-the-art performance enhancement. This approach has been compared with other models developed in the same tasks using Support Vector Machines (SVM), random forest (RF), and fully connected neural networks. The proposed model achieved a performance improvement over the best results obtained for this task.
Expressing our emotions using text and emojis expressions became widespread through social media such as Facebook, Instagram, Twitter, Weibo, and LinkedIn. Nowadays, both organizations and individuals are interested in using social media to analyze people's opinions and extract sentiments and emotions. We proposed a model for multilabel emotion classification, using a bidirectional Long Short-term Memory BiLSTM deep network. It is evaluated on the Arabic tweets' dataset provided by SemEval 2018 for the E-c task. Several preprocessing steps, including ARLSTEM with some modifications, replacing emojis with corresponding text meaning from a manually built lexicon, and feature vector representation using Aravec word embedding is applied. The novelty in our research that it examines the effect of hyperparameter tuning on model performance, and it uses BiLSTM in all of its deep neural network layers. The proposed model achieves a comparable performance with state-of-the-art models using different machine learning and deep learning techniques. The system achieves about 9% enhancement in validation accuracy compared with the last best model in the same task using Support Vector classifier SVC; it outperforms the other deep neural networks (UNCCTeam) based on fully connected layers in micro F1 metric of about 4.4%.
Multilabel emotion classification is a high priority because it mimics real-life scenarios in which people display a variety of emotions. The text could express a collection of emotions such as happiness, love, and optimism, or sadness, anger, and pessimism. In this framework, the Arabic tweets data provided by SemEval 2018-Task1, E-c subtask have been first preprocessed through different normalization steps, including stemming, stop word removal, special characters, and digits removal. An emotion lexicon has been built to replace the emotions with their meaning related to emotion classes. A word embedding pre-trained model Aravec has been implemented for the feature extraction process because word embedding performed better in this task than other features such as the N-gram model. In the classification process of our framework, different machine learning techniques have been implemented, including Multi-Layer Perceptron (MLP), Support Vector Machine SVM, K Nearest Neighbor (KNN), Ensemble Random Forest (RF), and Ensemble Extra Tree. The best performance was achieved using MLP, whereas SVM proved to perform best over other Traditional machine learning techniques such as KNN, RF, and Extra tree. Extra tree achieved a multilabel Jaccard accuracy of 26.2%, Nearest Neighbor (KNN) of 37.5%, Ensemble Random Forest (RF) of 29.1%, and SVM accuracy of 46.3%. A neural network model Multi-Layer Perceptron (MLP), achieved an accuracy of 48%. The proposed framework has been compared with different previous machine learning models built for this task; the results obtained by the proposed framework outperform other previous models in most cases.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.