2022
DOI: 10.3390/app122211388
|View full text |Cite
|
Sign up to set email alerts
|

Enhancing Detection of Arabic Social Spam Using Data Augmentation and Machine Learning

Abstract: In recent years, people have tended to use online social platforms, such as Twitter and Facebook, to communicate with families and friends, read the latest news, and discuss social issues. As a result, spam content can easily spread across them. Spam detection is considered one of the important tasks in text analysis. Previous spam detection research focused on English content, with less attention to other languages, such as Arabic, where labeled data are often hard to obtain. In this paper, an integrated fram… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
6
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 12 publications
(6 citation statements)
references
References 50 publications
0
6
0
Order By: Relevance
“…So, the percentage of ham tweets is double the percentage of spam tweets. The dataset in [9] used only one single hashtag to collect all the data. So, the classifier will only fit this specific data.…”
Section: Experimental Design Materials and Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…So, the percentage of ham tweets is double the percentage of spam tweets. The dataset in [9] used only one single hashtag to collect all the data. So, the classifier will only fit this specific data.…”
Section: Experimental Design Materials and Methodsmentioning
confidence: 99%
“…While the available dataset is sufficient for training deep learning models, it is valuable to augment it to increase the diversity and quality of data. Data augmentation [9] , particularly, helps to balance the number of spam and ham tweets. However, augmenting Arabic text data in natural language processing presents challenges due to the complex nature of language.…”
Section: Data Descriptionmentioning
confidence: 99%
“…Several approaches have been suggested to address the aforementioned issues with the use of ML, with data augmentation being one of the most popular. [8][9][10][11][12] Data augmentation involves modifying or adding features to the original data to generate new data. It is often employed to minimize significant errors that may arise when building diagnostic models using small datasets.…”
Section: Introductionmentioning
confidence: 99%
“…Several approaches have been suggested to address the aforementioned issues with the use of ML, with data augmentation being one of the most popular 8–12 . Data augmentation involves modifying or adding features to the original data to generate new data.…”
Section: Introductionmentioning
confidence: 99%
“…Rumors, highly contagious within tight-knit Arabic communities, necessitate vigilant monitoring to counteract panic and misinformation, exploiting cultural contexts for added complexity Harrag and Djahli, 2022). Spam, spanning fraudulent ads and misleading claims, pervades digital spaces in all languages, underlining the need to distinguish it from credible content for online source credibility (Kaddoura et al, 2023;Alkadri et al, 2022). Propaganda, a pivotal element of disinformation campaigns, influences public opinion and necessitates understanding and countering within the Arabic-speaking context to protect individuals and communities from manipulation by misleading narratives (Sharara et al, 2022;Feldman et al, 2021).…”
Section: Introductionmentioning
confidence: 99%