2013
DOI: 10.1371/journal.pone.0054998
|View full text |Cite
|
Sign up to set email alerts
|

Automated Authorship Attribution Using Advanced Signal Classification Techniques

Abstract: In this paper, we develop two automated authorship attribution schemes, one based on Multiple Discriminant Analysis (MDA) and the other based on a Support Vector Machine (SVM). The classification features we exploit are based on word frequencies in the text. We adopt an approach of preprocessing each text by stripping it of all characters except a-z and space. This is in order to increase the portability of the software to different types of texts. We test the methodology on a corpus of undisputed English text… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
17
0
3

Year Published

2015
2015
2020
2020

Publication Types

Select...
7
3

Relationship

0
10

Authors

Journals

citations
Cited by 28 publications
(20 citation statements)
references
References 28 publications
(25 reference statements)
0
17
0
3
Order By: Relevance
“…More specifically, we show that the symmetry os specific words is able to identify the writing style of distinct authors. In the context of information sciences, the authorship recognition task is relevant because it can be useful to classify literary manuscripts [28] and intercept terrorist messages [29]. Traditional features employed for stylometric analysis include simple statistics such as the average length and frequency of words [30], richness of vocabulary size [30] and burstiness indexes [7].…”
Section: Pattern Recognition Methodsmentioning
confidence: 99%
“…More specifically, we show that the symmetry os specific words is able to identify the writing style of distinct authors. In the context of information sciences, the authorship recognition task is relevant because it can be useful to classify literary manuscripts [28] and intercept terrorist messages [29]. Traditional features employed for stylometric analysis include simple statistics such as the average length and frequency of words [30], richness of vocabulary size [30] and burstiness indexes [7].…”
Section: Pattern Recognition Methodsmentioning
confidence: 99%
“…Previous research on individual differences in word choice has focused on written text and function words (e.g., Ebrahimpour et al, 2013;Koppel et al, 2009;Stamatatos, 2009). Content words, such as table and sleeping, and word combinations, such as old tree, are very context dependent.…”
Section: Introductionmentioning
confidence: 99%
“…The approaches to authorship identification can combine accumulated knowledge from the theory of image recognition, mathematical statistics and probability theory, neural networks, cluster analysis, Markov chains, and others [6][7][8][9][10][11]. Paper [6] studies the state of the problem today; it is noted that if there are texts by 3-4 authors in the training and testing samples, trained classifiers confidently demonstrate up to 85 % of the accuracy of identification of authorship of a text in the test sample.…”
Section: Literature Review and Problem Statementmentioning
confidence: 99%