A MapReduce Opinion Mining for COVID-19-Related Tweets Classification Using Enhanced ID3 Decision Tree Classifier

Es-sabery, Fatima; Es-sabery, Khadija; Qadir, Junaid; Abajo, Beatriz Sainz de; Haïr, Abdellatif; Garcia-Zapirain, Begonya; Díez, Isabel de la Torre

doi:10.1109/access.2021.3073215

Cited by 45 publications

(32 citation statements)

References 93 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…When the DOM findings were compared to human annotations, the average accuracy for long and short text was determined to be 76.32 percent. Supervised sentiment analysis using MapReduce was also proposed in [14]. For identifying the sentiment of large corpora, it suggested employing WordMap, a lexicon dictionary, and natural language processing rules in MapReduce operations.…”

Section: Literature Studymentioning

confidence: 99%

A Parallel Approach for Sentiment Analysis on Social Networks Using Spark

Iqbal¹,

Latha²

2023

Intelligent Automation &Amp; Soft Computing

View full text Add to dashboard Cite

Section: Literature Studymentioning

confidence: 99%

A Parallel Approach for Sentiment Analysis on Social Networks Using Spark

Iqbal¹,

Latha²

2023

Intelligent Automation &Amp; Soft Computing

View full text Add to dashboard Cite

“…Most conventional research papers on sentiment analysis has employed supervised machine learning approaches as the primary module for classification or clustering [11]. These approaches typically exploit the Bag-Of-Words, Word2vec, GloVe, FastText, N-Gram and TF-IDF models to extract the essential features of the text containing user-generated sentiments [12].…”

Section: Related Workmentioning

confidence: 99%

“…P, R, F1 and A of our approach with other approaches selected from the existing literature From the results shown in the table12, we remark that our approach (CNN+FastText) obtained the strongest performances in terms of accuracy (91.32%), precision (93.43%), recall (90.89%), and F1 measures (92.14%) compared to other chosen classifiers from the literature which are Naresh et al[14], Carvalho et al[15], Avinash et al[16], Kumar et al[17] and Zainuddin et al[18].VII. CONCLUSIONFeature extraction is needed to get good performance in sentiment classification.…”

mentioning

confidence: 99%

Evaluation of different extractors of features at the level of sentiment analysis

Es-sabery¹,

Es-sabery²,

Garmani³

et al. 2022

Infocommunications journal

Self Cite

View full text Add to dashboard Cite

Sentiment analysis is the process of recognizing and categorizing the emotions being expressed in a textual source. Tweets are commonly used to generate a large amount of sentiment data after they are analyzed. These feelings data help to learn about people's thoughts on a various range of topics. People are typically attracted for researching positive and negative reviews, which contain dislikes and likes, shared by the consumers concerning the features of a certain service or product. Therefore, the aspects or features of the product/ service play an important role in opinion mining. Furthermore to enough work being carried out in text mining, feature extraction in opinion mining is presently becoming a hot research field. In this paper, we focus on the study of feature extractors because of their importance in classification performance. The feature extraction is the most critical aspect of opinion classification since classification efficiency can be degraded if features are not properly chosen. A few scientific researchers have addressed the issue of feature extraction. And we found in the literature that almost every article deals with one or two feature extractors. For that, we decided in this paper to cover all the most popular feature extractors which are BOW, N-grams, TF-IDF, Word2vec, GloVe and FastText. In general, this paper will discuss the existing feature extractors in the opinion mining domain. Also, it will present the advantages and the inconveniences of each extractor. Moreover, a comparative study is performed for determining the most efficient combination CNN/extractor in terms of accuracy, precision, recall, and F1 measure.

show abstract

“…Also, the structure of decision trees requires less execution time in data classification compared to other machine learning classification techniques [ 19 ]. There are several different approaches to decision trees, including the LMT, C4.5, C5.0, and CART trees, in a variety of research areas such as basic science studies [ 20 ], medicine [ 21 ], and classification images [ 22 ] have been utilized. The random forest is a conventional machine learning algorithm for solving complex problems which is one of the supervised learning methods and its structural model is based on the tree and is used in issues such as classification and regression.…”

Section: Introductionmentioning

confidence: 99%

Application of machine learning models based on decision trees in classifying the factors affecting mortality of COVID-19 patients in Hamadan, Iran

Moslehi

Rabiei

Soltanian

et al. 2022

BMC Med Inform Decis Mak

View full text Add to dashboard Cite

Background Due to the high mortality of COVID-19 patients, the use of a high-precision classification model of patient’s mortality that is also interpretable, could help reduce mortality and take appropriate action urgently. In this study, the random forest method was used to select the effective features in COVID-19 mortality and the classification was performed using logistic model tree (LMT), classification and regression tree (CART), C4.5, and C5.0 tree based on important features. Methods In this retrospective study, the data of 2470 COVID-19 patients admitted to hospitals in Hamadan, west Iran, were used, of which 75.02% recovered and 24.98% died. To classify, at first among the 25 demographic, clinical, and laboratory findings, features with a relative importance more than 6% were selected by random forest. Then LMT, C4.5, C5.0, and CART trees were developed and the accuracy of classification performance was evaluated with recall, accuracy, and F1-score criteria for training, test, and total datasets. At last, the best tree was developed and the receiver operating characteristic curve and area under the curve (AUC) value were reported. Results The results of this study showed that among demographic and clinical features gender and age, and among laboratory findings blood urea nitrogen, partial thromboplastin time, serum glutamic-oxaloacetic transaminase, and erythrocyte sedimentation rate had more than 6% relative importance. Developing the trees using the above features revealed that the CART with the values of F1-score, Accuracy, and Recall, 0.8681, 0.7824, and 0.955, respectively, for the test dataset and 0.8667, 0.7834, and 0.9385, respectively, for the total dataset had the best performance. The AUC value obtained for the CART was 79.5%. Conclusions Finding a highly accurate and qualified model for interpreting the classification of a response that is considered clinically consequential is critical at all stages, including treatment and immediate decision making. In this study, the CART with its high accuracy for diagnosing and classifying mortality of COVID-19 patients as well as prioritizing important demographic, clinical, and laboratory findings in an interpretable format, risk factors for prognosis of COVID-19 patients mortality identify and enable immediate and appropriate decisions for health professionals and physicians.

show abstract

A MapReduce Opinion Mining for COVID-19-Related Tweets Classification Using Enhanced ID3 Decision Tree Classifier

Cited by 45 publications

References 93 publications

A Parallel Approach for Sentiment Analysis on Social Networks Using Spark

A Parallel Approach for Sentiment Analysis on Social Networks Using Spark

Evaluation of different extractors of features at the level of sentiment analysis

Application of machine learning models based on decision trees in classifying the factors affecting mortality of COVID-19 patients in Hamadan, Iran

Contact Info

Product

Resources

About