Machine Learning for Authorship Attribution in Arabic Poetry

Ahmed, Al-Falahi; Ramdani, Mohamed; Bellafkih, Mostafa

doi:10.18178/ijfcc.2017.6.2.486

Cited by 19 publications

(5 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…set2: Character features + word length feature set3: Character features + word length + sentence length set4: Character features + word length + sentence length + first word in sentence set5: Character features + word length + sentence length + first word in sentence + rhyme The best accuracy obtained was 96.7%. They also repeated the experiment with applying NB, SVM and SMO [23]. The features set consists of those features that were used in [72] and the metre of the Arabic poetry and followed the same methodology.…”

Section: Machine Learning Methods In Arabic Authorship Attributionmentioning

confidence: 99%

“…Basically, the machine-learning approach tackles the AA problem by assigning class labels to text samples. Surveying the literature, we found a large number of methods and approaches that were developed to tackle the AA problem such as Support Vector Machine (SVM) [18]- [23], naive Bayes (NB) [4], [20], [24], [25], Bayesian classifiers [26], [27], k-nearest neighbor (k-NN) [28], [29], decision trees [30], and Recurrent Neural Network (RNN) [31]. Although the ensemble methods showed a good performance to improve machine learning results, few studies such as [32]- [34] employed them in AA area.…”

Section: Introductionmentioning

confidence: 99%

“…The Arabic language is the mother tongue for more than 250 million people who reside mainly on two different continents. However, the works on AA for Arabic are still less numerous than those on English [5], [23], [35]- [46]. Thus, this paper aims to bridge the gap and investigates whether applying the ensemble methods lead to improve the accuracy of the AA task in the Arabic language, in addition to selecting the base classifier for ensemble methods and optimal combination of features.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Ensemble Methods for Instance-Based Arabic Language Authorship Attribution

et al. 2020

View full text Add to dashboard Cite

The Authorship Attribution (AA) is considered as a subfield of authorship analysis and it is an important problem as the range of anonymous information increased with fast-growing of internet usage worldwide. In other languages such as English, Spanish and Chinese, such issue is quite well studied. However, in the Arabic language, the AA problem has received less attention from the research community due to the complexity and nature of Arabic sentences. The paper presented an intensive review of previous studies for Arabic language. Based on that, this study has employed the Technique for Order Preferences by Similarity to Ideal Solution (TOPSIS) method to choose the base classifier of the ensemble methods. In terms of attribution features, hundreds of stylometric features and distinct words using several tools have been extracted. Then, AdaBoost and Bagging ensemble methods have been applied to Arabic enquires (Fatwa) dataset. The findings showed an improvement of the effectiveness of the authorship attribution task in the Arabic language. INDEX TERMS Authorship attribution, ensemble methods, stylometric features, TOPSIS method.

show abstract

Section: Machine Learning Methods In Arabic Authorship Attributionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Ensemble Methods for Instance-Based Arabic Language Authorship Attribution

et al. 2020

View full text Add to dashboard Cite

show abstract

“…Different statistical and machine learning-based techniques were recently applied on AA [4]. These techniques included Naive Bayes [5,6], Support Vector Machine (SVM) [7][8][9][10][11][12], Bayesian classifiers [13], k-nearest neighbor [14,15], and decision trees [16]. The authorship attribution for texts written in English, Spanish and Chinese has been studied well in the literature; however, less attention was given to the texts written in Arabic because of the complexity of Arabic scripts [17].…”

Section: Introductionmentioning

confidence: 99%

Deep Learning-based Method for Enhancing the Detection of Arabic Authorship Attribution using Acoustic and Textual-based Features

Al-Sarem¹,

Saeed²,

Qasem³

et al. 2023

IJACSA

View full text Add to dashboard Cite

Authorship attribution (AA) is defined as the identification of the original author of an unseen text. It is found that the style of the author's writing can change from one topic to another, but the author's habits are still the same in different texts. The authorship attribution has been extensively studied for texts written in different languages such as English. However, few studies investigated the Arabic authorship attribution (AAA) due to the special challenges faced with the Arabic scripts. Additionally, there is a need to identify the authors of texts extracted from livestream broadcasting and the recorded speeches to protect the intellectual property of these authors. This paper aims to enhance the detection of Arabic authorship attribution by extracting different features and fusing the outputs of two deep learning models. The dataset used in this study was collected from the weekly livestream and recorded Arabic sermons that are available publicly on the official website of Al-Haramain in Saudi Arabia. The acoustic, textual and stylometric features were extracted for five authors. Then, the data were pre-processed and fed into the deep learning-based models (CNN architecture and its pre-trained ResNet34). After that the hard and soft voting ensemble methods were applied for combining the outputs of the applied models and improve the overall performance. The experimental results showed that the use of CNN with textual data obtained an acceptable performance using all evaluation metrics. Then, the performance of ResNet34 model with acoustic features outperformed the other models and obtained the accuracy of 90.34%. Finally, the results showed that the soft voting ensemble method enhanced the performance of AAA and outperformed the other method in terms of accuracy and precision, which obtained 93.19% and 0.9311 respectively.

show abstract

“…Posadas‐Durán et al (2017) presented an approach that uses word n ‐grams and the Doc2vec to distribute document representations; they achieved over 98% accuracy in binary authorship attribution. Al‐Falahi et al (2017) used an ensemble of several features and classifiers to assign authorship to poetry; the highest accuracy rate was 99.1%. Nevertheless, limited research has been conducted on open‐set attribution.…”

Section: Introductionmentioning

confidence: 99%

A review on authorship attribution in text mining

Zheng

Jin

2022

WIREs Computational Stats

View full text Add to dashboard Cite

The issue of authorship attribution has long been considered and continues to be a popular topic. Because of advances in digital computers, this field has experienced rapid developments in the last decade. In this article, a survey of recent advances in authorship attribution in text mining is presented. This survey focuses on authorship attribution methods that are statistically or computationally supported as opposed to traditional literary approaches. The main aspects covered include the changes in research topics over time, basic feature metrics, machine learning techniques, and the advantages and disadvantages of each approach. Moreover, the corpus size, number of candidates, data imbalance, and result description, all of which pose challenges in authorship attribution, are discussed to inform future work.This article is categorized under: Statistical Learning and Exploratory Methods of the Data Sciences > Text Mining

show abstract

Machine Learning for Authorship Attribution in Arabic Poetry

Cited by 19 publications

References 10 publications

Ensemble Methods for Instance-Based Arabic Language Authorship Attribution

Ensemble Methods for Instance-Based Arabic Language Authorship Attribution

Deep Learning-based Method for Enhancing the Detection of Arabic Authorship Attribution using Acoustic and Textual-based Features

A review on authorship attribution in text mining

Contact Info

Product

Resources

About