2018
DOI: 10.2333/jbhmk.45.39
|View full text |Cite
|
Sign up to set email alerts
|

Accuracy and Standardized Judgment Procedures for Author Identification by Text Mining

Abstract: This study examined the accuracy for author identification by text mining. We conducted 16 analyses (four writing styles × four multivariate analyses) across texts of 100 Bloggers, written by approximately 1,000 characters. Specifically, we conducted (1) principal components analysis, (2) correspondence analysis, (3) multi-dimensional scaling, and (4) hierarchical cluster analysis on each writing style: (1) rate of usage of non-independent words, (2) bigram of parts-of-speech, (3) bigram of postpositional part… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2

Relationship

2
0

Authors

Journals

citations
Cited by 2 publications
(3 citation statements)
references
References 5 publications
0
3
0
Order By: Relevance
“…To control text lengths, texts containing approximately 1,000 characters were generated by randomly extracting the sentences from each paper, excluding citations. In case of authorship identification using stylometric analysis, texts containing more characters facilitate more accurate identification of authors; however, the minimum number of characters for valid level was determined as approximately 1,000 characters [ 14 , 15 ].…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…To control text lengths, texts containing approximately 1,000 characters were generated by randomly extracting the sentences from each paper, excluding citations. In case of authorship identification using stylometric analysis, texts containing more characters facilitate more accurate identification of authors; however, the minimum number of characters for valid level was determined as approximately 1,000 characters [ 14 , 15 ].…”
Section: Methodsmentioning
confidence: 99%
“…This study included adverbs as function words. Zaitsu & Jin [ 14 ] reported the validity of these four stylometric features in identifying Japanese authors by analyzing texts containing approximately 1,000 characters: the most effective features were the rate of function words; the next was bigrams of parts-of-speech. That study reported high-classification performance levels: 100% on sensitivity and 95.1% on specificity.…”
Section: Methodsmentioning
confidence: 99%
“…Function words. Preceding study of authorship attribution [10] and AI detection task [5] reported the function words as quite distinguishable features: "だ (auxiliary verb)," "また (conjunction)," and "は (postpositional particle). "…”
Section: Plos Onementioning
confidence: 99%