Sentiment analysis is a field of study for analyzing people's emotions, such as Nice, Happy, ਦੁ ਖੀ (sad), changa (Good), etc. towards the entities and attributes expressed in written text. It noticed that, on microblogging websites (Facebook, YouTube, Twitter ), most people used more than one language to express their emotions. The change of one language to another language within the same written text is called code-mixing.In this research, we gathered the English-Punjabi code-mixed corpus from micro-blogging websites. We have performed language identification of code-mix text, which includes Phonetic Typing, Abbreviation, Wordplay, Intentionally misspelled words and Slang words. Then we performed tokenization of English and Punjabi language words consisting of different spellings. Then we performed sentiment analysis based on the above text based on the lexicon approach. The dictionary created for English Punjabi code mixed consists of opinionated words. The opinionated words are then categorized into three categories i.e. positive words list, negative words list, and neutral words list. The rest of the words are being stored in an unsorted word list. By using the Ngram approach, a statistical technique is applied at sentence level sentiment polarity of the English-Punjabi codemixed dataset. Our results show an accuracy of 83% with an F-1 measure of 77%.
This paper proposes a framework for preparing and using corpora from online social networks and review sites for sentiment analysis task. The framework consists of three phases. The first phase is the preprocessing and cleaning of data collected, then data annotation. The second phase is applying various text processing techniques including: removing stopwords, replacing the negation words and the following negated words with the antonyms of the negated words, and using selective words of part-of-speech tags (adjectives and verbs) on the prepared corpora. The third phase is text classification using Naïve Bayes and Decision Tree classifiers and two feature selection approaches, unigrams and bigrams. The experiments show that the data is extremely unbalanced. The results show that applying text processing techniques improve the classification accuracy of the Naïve Bayes classifier and reduce the training time of both classifiers. The results also show that Decision tree classifier is more suitable for imbalance data.
Nowadays, an unprecedented number of users interact through social media platforms and generate a massive amount of content due to the explosion of online communication. However, because user-generated content is unregulated, it may contain offensive content such as fake news, insults, and harassment phrases. The identification of fake news and rumors and their dissemination on social media has become a critical requirement. They have adverse effects on users, businesses, enterprises, and even political regimes and governments. State of the art has tackled the English language for news and used feature-based algorithms. This paper proposes a model architecture to detect fake news in the Arabic language by using only textual features. Machine learning and deep learning algorithms were used. The deep learning models are used depending on conventional neural nets (CNN), long short-term memory (LSTM), bidirectional LSTM (BiLSTM), CNN+LSTM, and CNN + BiLSTM. Three datasets were used in the experiments, each containing the textual content of Arabic news articles; one of them is reallife data. The results indicate that the BiLSTM model outperforms the other models regarding accuracy rate when both simple data split and recursive training modes are used in the training process.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.