2023
DOI: 10.21105/joss.04886
|View full text |Cite
|
Sign up to set email alerts
|

PyArabic: A Python package for Arabic text

Abstract: Because text is the most common type of information representation, text processing and manipulation require recurring routines and functions. Every day, massive amounts of text are processed. Indeed, with the advent of artificial intelligence and new machine learning and deep learning enhancements, natural language processing has become a critical domain.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
1
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
5

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(2 citation statements)
references
References 33 publications
0
1
0
Order By: Relevance
“…Normalizing text is an important preprocessing step in natural language processing that involves transforming text data into a standardized format. Normalization of Arabic text involves several sub-tasks, including removing diacritics (Zerrouki, 2023), normalizing characters, and removing ligatures. These sub-tasks are essential for improving the accuracy of downstream tasks such as text classification, named entity recognition, and sentiment analysis.…”
Section: Statement Of Needmentioning
confidence: 99%
“…Normalizing text is an important preprocessing step in natural language processing that involves transforming text data into a standardized format. Normalization of Arabic text involves several sub-tasks, including removing diacritics (Zerrouki, 2023), normalizing characters, and removing ligatures. These sub-tasks are essential for improving the accuracy of downstream tasks such as text classification, named entity recognition, and sentiment analysis.…”
Section: Statement Of Needmentioning
confidence: 99%
“…• While some existing text-processing packages, e.g. Zerrouki (2022), focus on a specific group of languages, Arabica offers text-mining methods for all Latin Alphabet languages, including the stopwords removal of 18 lists of stopwords included in the NLTK corpus of stopwords.…”
Section: Statement Of Needmentioning
confidence: 99%