2018
DOI: 10.14569/ijacsa.2018.090928
|View full text |Cite
|
Sign up to set email alerts
|

NADA: New Arabic Dataset for Text Classification

Abstract: In the recent years, Arabic Natural Language Processing, including Text summarization, Text simplification, Text Categorization and other Natural Language-related disciplines, are attracting more researchers. Appropriate resources for Arabic Text Categorization are becoming a big necessity for the development of this research. The few existing corpora are not ready for use, they require preprocessing and filtering operations. In addition, most of them are not organized based on standard classification methods … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
14
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 21 publications
(14 citation statements)
references
References 17 publications
0
14
0
Order By: Relevance
“…For Nada, the accuracy results of our SVM and CNN models are shown in table 10. It is clear that the accuracy of our models with accuracy near 100% is superior to those obtained in [44].…”
Section: E Nadamentioning
confidence: 56%
See 1 more Smart Citation
“…For Nada, the accuracy results of our SVM and CNN models are shown in table 10. It is clear that the accuracy of our models with accuracy near 100% is superior to those obtained in [44].…”
Section: E Nadamentioning
confidence: 56%
“…This subsection compares the results of this study on NADA dataset with the results obtained by [44]. The authors argue that the low accuracy they obtained for NADA dataset (93.88%) is due to Abuaiadah dataset, where its classification accuracy was around 80%.…”
Section: E Nadamentioning
confidence: 73%
“…Arabic text classification research and the goal to enrich the Arabic corpus are slowly becoming a priority in the research community. In [ 31 ], the authors believe that many of the available datasets are not appropriate for classification, either because the classes are not defined well, or there are not any defined classes like in the 1.5 billion words Arabic Corpus [ 11 ]. The authors also introduce 'NADA,' a new filtered and preprocessed corpus, that combine already existing corpora DAA and OSAC.…”
Section: Literature Reviewmentioning
confidence: 99%
“…There are number of Arabic datasets like, DAA [11] is a dataset in which nine categories have been processed and standardized with 400 documents for each category, Akhbar-Alkhaleej [12] is a popular Arabic Dataset with 5690 Arabic news documents gathered regularly from the online newspaper "Akhbar-Alkhaleej". It consists of five categories: Alwatan [13] is an Arabic Dataset with 20,291 Arabic news documents collected regularly from its online newspaper, Al-Jazeera-News [14] Arabic Dataset (Alj-News) is an Arabic dataset with 1500 documents.…”
Section: Related Work IImentioning
confidence: 99%