2019
DOI: 10.1109/tcyb.2017.2766189
|View full text |Cite
|
Sign up to set email alerts
|

Learning Stylometric Representations for Authorship Analysis

Abstract: Authorship analysis (AA) is the study of unveiling the hidden properties of authors from a body of exponentially exploding textual data. It extracts an author's identity and sociolinguistic characteristics based on the reflected writing styles in the text. It is an essential process for various areas, such as cybercrime investigation, psycholinguistics, political socialization, etc. However, most of the previous techniques critically depend on the manual feature engineering process. Consequently, the choice of… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
53
0

Year Published

2019
2019
2020
2020

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 80 publications
(62 citation statements)
references
References 48 publications
1
53
0
Order By: Relevance
“…We focus on the two most well‐known topic modeling approaches: Latent Semantic Indexing (LSI) and Latent Dirichlet Allocation (LDA) and examine their suitability for each author verification paradigm. It is demonstrated that topic modeling can considerably increase the effectiveness of author verification methods when an appropriate topic modeling technique is selected and is adequately fine‐tuned. We examine the effect of using either a limited set of documents or an enriched document collection to extract latent topics and it is demonstrated that the latter assists author verification methods to further increase their effectiveness. We report experimental results on benchmark data sets developed during the relevant PAN‐2014 and PAN‐2015 shared tasks in author verification that are directly compared with state‐of‐the‐art methods under the same settings. The performance of the methods presented in this study is quite competitive to the best results reported so far for these data sets, demonstrating that topic modeling can be an efficient and effective alternative to more sophisticated methods (for example, based on representation learning, distributed document representation, or neural network language models (Bagnall, ; Ding et al, )) for the author verification task. We examine the effect of genre of external documents when extrinsic author verification methods are combined with topic modeling techniques. It is demonstrated that verification models based on genre‐agnostic external documents are very competitive, but they are outperformed by models using external documents of the same genre with that of the questioned documents.…”
Section: Introductionmentioning
confidence: 77%
See 4 more Smart Citations
“…We focus on the two most well‐known topic modeling approaches: Latent Semantic Indexing (LSI) and Latent Dirichlet Allocation (LDA) and examine their suitability for each author verification paradigm. It is demonstrated that topic modeling can considerably increase the effectiveness of author verification methods when an appropriate topic modeling technique is selected and is adequately fine‐tuned. We examine the effect of using either a limited set of documents or an enriched document collection to extract latent topics and it is demonstrated that the latter assists author verification methods to further increase their effectiveness. We report experimental results on benchmark data sets developed during the relevant PAN‐2014 and PAN‐2015 shared tasks in author verification that are directly compared with state‐of‐the‐art methods under the same settings. The performance of the methods presented in this study is quite competitive to the best results reported so far for these data sets, demonstrating that topic modeling can be an efficient and effective alternative to more sophisticated methods (for example, based on representation learning, distributed document representation, or neural network language models (Bagnall, ; Ding et al, )) for the author verification task. We examine the effect of genre of external documents when extrinsic author verification methods are combined with topic modeling techniques. It is demonstrated that verification models based on genre‐agnostic external documents are very competitive, but they are outperformed by models using external documents of the same genre with that of the questioned documents.…”
Section: Introductionmentioning
confidence: 77%
“…This approach has a relatively large number of hyperparameters to be set. The experimental results on PAN‐2014 benchmark data sets demonstrated that this sophisticated method can outperform existing methods as well as baselines based on topic modeling techniques (following an intrinsic and profile‐based paradigm with a prefixed number of latent topics) as well as other distributed word and document representations (word2vec, doc2vec) (Ding et al, ). In this article, we show that similar or even higher performance can be achieved by appropriately combining fine‐tuned topic modeling techniques and author verification paradigms.…”
Section: Previous Workmentioning
confidence: 99%
See 3 more Smart Citations