2020
DOI: 10.1109/access.2020.2964952
|View full text |Cite
|
Sign up to set email alerts
|

Ensemble Methods for Instance-Based Arabic Language Authorship Attribution

Abstract: The Authorship Attribution (AA) is considered as a subfield of authorship analysis and it is an important problem as the range of anonymous information increased with fast-growing of internet usage worldwide. In other languages such as English, Spanish and Chinese, such issue is quite well studied. However, in the Arabic language, the AA problem has received less attention from the research community due to the complexity and nature of Arabic sentences. The paper presented an intensive review of previous studi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
27
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
4

Relationship

2
7

Authors

Journals

citations
Cited by 37 publications
(27 citation statements)
references
References 67 publications
0
27
0
Order By: Relevance
“…Due to the possibility of a single classifier making a mistake, the ensemble will misclassify only, if more than half of the classifiers are incorrect. So, an ensemble’s performance is more effective than a single classifier [ 39 ]. The suggested machine learning level classifier in our work makes use of the ensemble methodology.…”
Section: Proposed Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Due to the possibility of a single classifier making a mistake, the ensemble will misclassify only, if more than half of the classifiers are incorrect. So, an ensemble’s performance is more effective than a single classifier [ 39 ]. The suggested machine learning level classifier in our work makes use of the ensemble methodology.…”
Section: Proposed Methodsmentioning
confidence: 99%
“…The accuracy is used to evaluate the results. This evaluation is commonly used in the literature [ 37 , 39 , 40 ], and is described as: …”
Section: 1 Datasetmentioning
confidence: 99%
“…Interestingly enough, the best classifier was Random Forest, a traditional machine learning classifier working with a set of manually engineered text features as opposed to deep learning classifiers that used a sequence of text tokens as an input. Character n-grams were used with a Convolutional Neural Network (CNN) by [31] for authorship attribution of short texts. They hypothesized that a CNN will be able to capture the stylistic features of a text through the use of successive convolution layers.…”
Section: B Attribution Methodsmentioning
confidence: 99%
“…For many text classification systems, pre-processing is considered as an essential step to improve the quality of data as well as the efficiency and accuracy of ML models [22,23]. The common pre-processing steps include text cleansing, tokenization, removing stop words, stemming, and normalization.…”
Section: Pre-processingmentioning
confidence: 99%