What Are COVID-19 Arabic Tweeters Talking About?

Hamoui, Btool; Alashaikh, Abdulaziz; Alanazi, Eisa

doi:10.20944/preprints202007.0172.v1

Cited by 3 publications

(1 citation statement)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, they only included statistical analysis and clustering to generate summaries with some suggestion of future work. Yet, there are some studies with specific goals, such as analysis of the reaction of citizens during a pandemic [ 21 ] and identification of the most frequent unigrams, bigrams, and trigrams of tweets related to COVID-19 [ 22 ]. In addition, considering the study by Alanazi et al [ 23 ] that identified the symptoms of COVID-19 from Arabic tweets, the authors noted the limitation that they used modern standard Arabic keywords only, and it would be important to consider dialectical keywords in order to better catch tweets on COVID-19 symptoms written in Arabic, because some Arab users post on social media in their own local dialect.…”

Section: Introductionmentioning

confidence: 99%

Social Media Monitoring of the COVID-19 Pandemic and Influenza Epidemic With Adaptation for Informal Language in Arabic Twitter Data: Qualitative Study

Alsudias¹,

Rayson²

2021

JMIR Med Inform

View full text Add to dashboard Cite

Background Twitter is a real-time messaging platform widely used by people and organizations to share information on many topics. Systematic monitoring of social media posts (infodemiology or infoveillance) could be useful to detect misinformation outbreaks as well as to reduce reporting lag time and to provide an independent complementary source of data compared with traditional surveillance approaches. However, such an analysis is currently not possible in the Arabic-speaking world owing to a lack of basic building blocks for research and dialectal variation. Objective We collected around 4000 Arabic tweets related to COVID-19 and influenza. We cleaned and labeled the tweets relative to the Arabic Infectious Diseases Ontology, which includes nonstandard terminology, as well as 11 core concepts and 21 relations. The aim of this study was to analyze Arabic tweets to estimate their usefulness for health surveillance, understand the impact of the informal terms in the analysis, show the effect of deep learning methods in the classification process, and identify the locations where the infection is spreading. Methods We applied the following multilabel classification techniques: binary relevance, classifier chains, label power set, adapted algorithm (multilabel adapted k-nearest neighbors [MLKNN]), support vector machine with naive Bayes features (NBSVM), bidirectional encoder representations from transformers (BERT), and AraBERT (transformer-based model for Arabic language understanding) to identify tweets appearing to be from infected individuals. We also used named entity recognition to predict the place names mentioned in the tweets. Results We achieved an F1 score of up to 88% in the influenza case study and 94% in the COVID-19 one. Adapting for nonstandard terminology and informal language helped to improve accuracy by as much as 15%, with an average improvement of 8%. Deep learning methods achieved an F1 score of up to 94% during the classifying process. Our geolocation detection algorithm had an average accuracy of 54% for predicting the location of users according to tweet content. Conclusions This study identified two Arabic social media data sets for monitoring tweets related to influenza and COVID-19. It demonstrated the importance of including informal terms, which are regularly used by social media users, in the analysis. It also proved that BERT achieves good results when used with new terms in COVID-19 tweets. Finally, the tweet content may contain useful information to determine the location of disease spread.

show abstract

Section: Introductionmentioning

confidence: 99%

Social Media Monitoring of the COVID-19 Pandemic and Influenza Epidemic With Adaptation for Informal Language in Arabic Twitter Data: Qualitative Study

Alsudias¹,

Rayson²

2021

JMIR Med Inform

View full text Add to dashboard Cite

show abstract

Social Media Monitoring of the COVID-19 Pandemic and Influenza Epidemic With Adaptation for Informal Language in Arabic Twitter Data: Qualitative Study (Preprint)

Alsudias¹,

Rayson²

2021

Preprint

View full text Add to dashboard Cite

BACKGROUND Twitter is a real time messaging platform widely used by people and organisations to share ‎information on many topics. It could potentially be useful to analyse tweets for infectious ‎disease monitoring purposes ‎ in order to reduce reporting lag time, and to provide an ‎independent complementary source of data, compared to traditional approaches. ‎However, such analysis is currently not possible in the Arabic speaking world due to lack of ‎basic building blocks for research.‎ OBJECTIVE We collect around 4,000 Arabic tweets related to COVID-19 and Influenza. We clean and ‎label the tweets relative to the Arabic Infectious Diseases Ontology which includes non-‎standard terminology and 11 core concepts and 21 relations. The aim of this study is to ‎analyse Arabic tweets to estimate their usefulness for health surveillance, understand the ‎impact of the informal terms in the analysis, show the effect of the deep learning methods ‎in the classification process, and identify the locations where the infection is spreading.‎ METHODS We apply multi-label classification techniques: Binary Relevance, Classifier Chains, Label ‎Powerset, Adapted Algorithm (MLKNN), NBSVM, BERT, and AraBERT to identify infected ‎people. We also use Named Entity Recognition to predict the locations affected. ‎ RESULTS We achieve an F1-score up to 88% in the Influenza case study and 94% in the COVID-19 one. ‎ ‎ Adapting for non-standard terminology and informal language helps to improve ‎accuracy by as ‎much as 15% with an average improvement of 8%.‎ Deep learning methods ‎achieve around 5% on hamming loss during the classifying process. Our geo-location ‎detection algorithm can predict on average 54% accuracy for the location of the users using ‎tweet content.‎ ‎ ‎ ‎ CONCLUSIONS This study identifies two Arabic social media datasets for monitoring tweets related to ‎Influenza and COVID-19‎. It demonstrates the importance of including informal terms, which ‎is regularly used by social media users, in the analysis. It also proves that BERT achieves good ‎results when used with new terms in COVID-19 tweets. Finally, the tweet content may ‎contain useful information to determine the location of the disease spread.

show abstract

Spatio-Temporal Sentiment Mining of COVID-19 Arabic Social Media

Elsaka

Afyouni

Hashem

et al. 2022

IJGI

View full text Add to dashboard Cite

Since the recent outbreak of COVID-19, many scientists have started working on distinct challenges related to mining the available large datasets from social media as an effective asset to understand people’s responses to the pandemic. This study presents a comprehensive social data mining approach to provide in-depth insights related to the COVID-19 pandemic and applied to the Arabic language. We first developed a technique to infer geospatial information from non-geotagged Arabic tweets. Secondly, a sentiment analysis mechanism at various levels of spatial granularities and separate topic scales is introduced. We applied sentiment-based classifications at various location resolutions (regions/countries) and separate topic abstraction levels (subtopics and main topics). In addition, a correlation-based analysis of Arabic tweets and the official health providers’ data will be presented. Moreover, we implemented several mechanisms of topic-based analysis using occurrence-based and statistical correlation approaches. Finally, we conducted a set of experiments and visualized our results based on a combined geo-social dataset, official health records, and lockdown data worldwide. Our results show that the total percentage of location-enabled tweets has increased from 2% to 46% (about 2.5M tweets). A positive correlation between top topics (lockdown and vaccine) and the COVID-19 new cases has also been recorded, while negative feelings of Arab Twitter users were generally raised during this pandemic, on topics related to lockdown, closure, and law enforcement.

show abstract

What Are COVID-19 Arabic Tweeters Talking About?

Cited by 3 publications

References 14 publications

Social Media Monitoring of the COVID-19 Pandemic and Influenza Epidemic With Adaptation for Informal Language in Arabic Twitter Data: Qualitative Study

Social Media Monitoring of the COVID-19 Pandemic and Influenza Epidemic With Adaptation for Informal Language in Arabic Twitter Data: Qualitative Study

Social Media Monitoring of the COVID-19 Pandemic and Influenza Epidemic With Adaptation for Informal Language in Arabic Twitter Data: Qualitative Study (Preprint)

Spatio-Temporal Sentiment Mining of COVID-19 Arabic Social Media

Contact Info

Product

Resources

About