Opinion Mining (OM) is a field of Natural Language Processing (NLP) that aims to capture human sentiment in the given text. With the ever-spreading of online purchasing websites, micro-blogging sites, and social media platforms, OM in online social media platforms has picked the interest of thousands of scientific researchers. Because the reviews, tweets and blogs acquired from these social media networks, act as a significant source for enhancing the decision making process. The obtained textual data (reviews, tweets, or blogs) are classified into three different class labels which are negative, neutral and positive for analyzing and extracting relevant information from the given dataset. In this contribution, we introduce an innovative MapReduce improved weighted ID3 decision tree classification approach for OM, which consists mainly of three aspects: Firstly We have used several feature extractors to efficiently detect and capture the relevant data from the given tweets, including N-grams or character-level, Bag-Of-Words, word embedding (GloVe, Word2Vec), FastText, and TF-IDF. Secondly, we have applied a multiple feature selector to reduce the high feature's dimensionality, including Chi-square, Gain Ratio, Information Gain, and Gini Index. Finally, we have employed the obtained features to carry out the classification task using an improved ID3 decision tree classifier, which aims to calculate the weighted information gain instead of information gain used in traditional ID3. In other words, to measure the weighted information gain for the current conditioned feature, we follow two steps: First, we compute the weighted correlation function of the current conditioned feature. Second, we multiply the obtained weighted correlation function by the information gain of this current conditioned feature. This work is implemented in a distributed environment using the Hadoop framework, with its programming framework MapReduce and its distributed file system HDFS. Its primary goal is to enhance the performance of a well-known ID3 classifier in terms of accuracy, execution time, and ability to handle the massive datasets. We have carried out several experiences that aims to assess the effectiveness of our suggested classifier compared to some other contributions chosen from the literature. The experimental results demonstrated that our ID3 classifier works better on COVID-19_Sentiments dataset than other classifiers in terms of Recall (85.72 %), specificity (86.51 %), error rate (11.18 %), false-positive rate (13.49 %), execution time (15.95s), kappa statistic (87.69 %), F1-score (85.54 %), classification rate (88.82 %), false-negative rate (14.28 %), precision rate (86.67 %), convergence (it convergent towards the iteration 90), stability (it is more stable with mean deviation standard equal to 0.12 %), and complexity (it requires much lower time and space computational complexity).
At present, with the growing number of Web 2.0 platforms such as Instagram, Facebook, and Twitter, users honestly communicate their opinions and ideas about events, services, and products. Owing to this rise in the number of social platforms and their extensive use by people, enormous amounts of data are produced hourly. However, sentiment analysis or opinion mining is considered as a useful tool that aims to extract the emotion and attitude from the user-posted data on social media platforms by using different computational methods to linguistic terms and various Natural Language Processing (NLP). Therefore, enhancing text sentiment classification accuracy has become feasible, and an interesting research area for many community researchers. In this study, a new Fuzzy Deep Learning Classifier (FDLC) is suggested for improving the performance of data-sentiment classification. Our proposed FDLC integrates Convolutional Neural Network (CNN) to build an effective automatic process for extracting the features from collected unstructured data and Feedforward Neural Network (FFNN) to compute both positive and negative sentimental scores. Then, we used the Mamdani Fuzzy System (MFS) as a fuzzy classifier to classify the outcomes of the two used deep (CNN+FFNN) learning models in three classes, which are: Neutral, Negative, and Positive. Also, to prevent the long execution time taking by our hybrid proposed FDLC, we have implemented our proposal under the Hadoop cluster. An experimental comparative study between our FDLC and some other suggestions from the literature is performed to demonstrate our offered classifier's effectiveness. The empirical result proved that our FDLC performs better than other classifiers in terms of true positive rate, true negative rate, false positive rate, false negative rate, error rate, precision, classification rate, kappa statistic, F1-score and time consumption, complexity, convergence, and stability.
Decision tree is the most efficient and fast technology of data mining that is frequently used in data analysis and prediction. According to the development in science and technology in the last years, the data is growing faster, and the principle of the decision tree algorithms become not efficient in respect runtime and speed-up ratio. In view of the above problem, we propose a new method of classification based on framework Hadoop and Fuzzy logic. Our proposed hybrid approach is designed to propose a new C4.5 decision tree algorithm using fuzzy logic and fuzzy set theory to handle uncertainty and imprecision in data, and Hadoop framework (MapReduce + HDFS) to parallelize our work. This combination of big data technologies, fuzzy systems and C4.5 decision tree algorithm has produced a parallel fuzzy decision tree model, which takes advantage of these three techniques (hadoop + fuzzy logic + C4.5) to produce a decision tree with higher predictive accuracy. In this paper, an experiment is presented to compare our approach with other approaches from the literature. Experiments were carried out using three datasets, and the results show that our new method outperforms the other approaches in terms of accuracy and execution time.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.