2021
DOI: 10.1145/3476467
|View full text |Cite
|
Sign up to set email alerts
|

UrduAI: Writeprints for Urdu Authorship Identification

Abstract: The authorship identification task aims at identifying the original author of an anonymous text sample from a set of candidate authors. It has several application domains such as digital text forensics and information retrieval. These application domains are not limited to a specific language. However, most of the authorship identification studies are focused on English and limited attention has been paid to Urdu. However, existing Urdu authorship identification solutions drop accuracy as the number of trainin… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
1
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(3 citation statements)
references
References 47 publications
0
3
0
Order By: Relevance
“…In forensics, language-based profiles can be examined, and the writers of letters, emails, and other documents used in an inquiry can be identified. Author gender identification is the initial step in author profiling investigations [39]- [43]. This task has been extensively investigated, however, automatically determining stylistic differences between men and women, on the other hand, is far from ideal [38], [44]- [46].…”
Section: Literature Review (State-of-the-art)mentioning
confidence: 99%
“…In forensics, language-based profiles can be examined, and the writers of letters, emails, and other documents used in an inquiry can be identified. Author gender identification is the initial step in author profiling investigations [39]- [43]. This task has been extensively investigated, however, automatically determining stylistic differences between men and women, on the other hand, is far from ideal [38], [44]- [46].…”
Section: Literature Review (State-of-the-art)mentioning
confidence: 99%
“…The aim was to advance the existing approaches and evaluate them on new standard datasets. Sarwar and Hassan [16] worked with stylometric features to overcome the limitations of having only n-gram features for Urdu texts; their experimental results showed an accuracy of 94.03%, indicating a discriminative power of the stylometric features used.…”
Section: Related Workmentioning
confidence: 99%
“…Qian et al [4] Deep learning 89.20 Mohsen et al [5] Deep learning 95.12 Zhang et al [6] Semantic relationship 95.30 Benzebouchi et al [7] Word embeddings and MLP 95.83 Anwar et al [9] LDA model with n-grams 93.17 and cosine similarity Rexha et al [10] Content and stylometric features 72.00 Confidence Nirkhi et al [12] Unigram and SVM 88.00 López-Monroy et al [13] Bag-of-words model with SVM 80.80 Sarwar and Hassan [16] Stylometric features 94.03 Chakraborty and Choudhury [17] Graph-based algorithm 94.98 Digamberrao and Prasad [18] SMO with J48 algorithm 80.00 Rakshit et al [19] Stylistic feature and SVM 92.30 Anisuzzaman and Salam [20] n-gram and NB 95.00…”
Section: Reference Approach Accuracy (%)mentioning
confidence: 99%