2020
DOI: 10.1007/s00521-020-05321-8
|View full text |Cite|
|
Sign up to set email alerts
|

Benchmarking performance of machine and deep learning-based methodologies for Urdu text document classification

Abstract: In order to provide benchmark performance for Urdu text document classification, the contribution of this paper is manifold. First, it provides a publicly available benchmark dataset manually tagged against 6 classes. Second, it investigates the performance impact of traditional machine learning-based Urdu text document classification methodologies by embedding 10 filter-based feature selection algorithms which have been widely used for other languages. Third, for the very first time, it assesses the performan… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
13
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
9
1

Relationship

0
10

Authors

Journals

citations
Cited by 27 publications
(13 citation statements)
references
References 114 publications
0
13
0
Order By: Relevance
“…(2) Text Classification: A recent study for Urdu, i.e. a low-resource language, made use of BERT in comparison to SVM for text categorization problem is given in [62]. Traditionally for SVM classification pipeline, they used preprocessing steps such as cleaning, stop-word removal, tokenization and lemmatization together with ten feature filtering algorithms.…”
Section: Nlp Problems From Turkish Language Literaturementioning
confidence: 99%
“…(2) Text Classification: A recent study for Urdu, i.e. a low-resource language, made use of BERT in comparison to SVM for text categorization problem is given in [62]. Traditionally for SVM classification pipeline, they used preprocessing steps such as cleaning, stop-word removal, tokenization and lemmatization together with ten feature filtering algorithms.…”
Section: Nlp Problems From Turkish Language Literaturementioning
confidence: 99%
“…In [27], the authors used a single layer CNN with multiple filters for document-level text classification and found it superior to the baseline methods. Asim et al [29] assessed performance of state-of-the-art ML, DL, and hybrid model for document classification. Their experiments showed that the normalized difference measure-based feature selection approach improved the performance of all models.…”
Section: Deep Learning Based Approachmentioning
confidence: 99%
“…Moreover, they observed that the classification rate exceeds 90% when using more than 4000 features. Some of the recent techniques [29] [30] employed different deep learning models to conduct the document classification and yielded good results. But these techniques are time-consuming.…”
Section: Related Workmentioning
confidence: 99%