Tashaphyne0.4: a new arabic light stemmer based on rhyzome modeling approach

Al-Khatib, Ra’ed M.; Zerrouki, Taha; Abu Shquier, Mohammed M.; Balla, Amar

doi:10.1007/s10791-023-09429-y

Inf Retrieval J

2023

DOI: 10.1007/s10791-023-09429-y

|View full text |Cite

Tashaphyne0.4: a new arabic light stemmer based on rhyzome modeling approach

Ra’ed M. Al-Khatib,

Taha Zerrouki,

Mohammed M. Abu Shquier

et al.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

Supporting

Mentioning

Contrasting

Year Published

2024

Publication Types

Select...

Article2

Relationship

Self Cite0

Independent2

Authors

Journals

Cited by 2 publications

References 32 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

ArSa-Tweets: A novel Arabic sarcasm detection system based on deep learning model

Abuein,

Al-Khatib,

Migdady

et al. 2024

Heliyon

View full text Add to dashboard Cite

ArSa-Tweets: A novel Arabic sarcasm detection system based on deep learning model

Abuein,

Al-Khatib,

Migdady

et al. 2024

Heliyon

View full text Add to dashboard Cite

Exploring the Effects of Pre-Processing Techniques on Topic Modeling of an Arabic News Article Data Set

Alangari,

Algethami

2024

Applied Sciences

View full text Add to dashboard Cite

This research investigates the impacts of pre-processing techniques on the effectiveness of topic modeling algorithms for Arabic texts, focusing on a comparison between BERTopic, Latent Dirichlet Allocation (LDA), and Non-Negative Matrix Factorization (NMF). Using the Single-label Arabic News Article Data set (SANAD), which includes 195,174 Arabic news articles, this study explores pre-processing methods such as cleaning, stemming, normalization, and stop word removal, which are crucial processes given the complex morphology of Arabic. Additionally, the influence of six different embedding models on the topic modeling performance was assessed. The originality of this work lies in addressing the lack of previous studies that optimize BERTopic through adjusting the n-gram range parameter and combining it with different embedding models for effective Arabic topic modeling. Pre-processing techniques were fine-tuned to improve data quality before applying BERTopic, LDA, and NMF, and the performance was assessed using metrics such as topic coherence and diversity. Coherence was measured using Normalized Pointwise Mutual Information (NPMI). The results show that the Tashaphyne stemmer significantly enhanced the performance of LDA and NMF. BERTopic, optimized with pre-processing and bi-grams, outperformed LDA and NMF in both coherence and diversity. The CAMeL-Lab/bert-base-arabic-camelbert-da embedding yielded the best results, emphasizing the importance of pre-processing in Arabic topic modeling.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Tashaphyne0.4: a new arabic light stemmer based on rhyzome modeling approach

Cited by 2 publications

References 32 publications

ArSa-Tweets: A novel Arabic sarcasm detection system based on deep learning model

ArSa-Tweets: A novel Arabic sarcasm detection system based on deep learning model

Exploring the Effects of Pre-Processing Techniques on Topic Modeling of an Arabic News Article Data Set

Contact Info

Product

Resources

About