Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL) 2019
DOI: 10.18653/v1/k19-1089
|View full text |Cite
|
Sign up to set email alerts
|

Reduce & Attribute: Two-Step Authorship Attribution for Large-Scale Problems

Abstract: Authorship attribution is an active research area which has been actively studied for many decades. Nevertheless, the majority of approaches consider problem sizes of a few candidate authors only, making them difficult to apply to recent scenarios incorporating thousands of authors emerging due to the manifold means to digitally share text. In this study, we focus on such large scale problems and propose to effectively reduce the number of candidate authors before applying common attribution techniques. By uti… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
9
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
3
3

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(9 citation statements)
references
References 28 publications
0
9
0
Order By: Relevance
“…In the case of sequential sampling, the sampling window is referred to as a sliding window due to the fact that it slides across the text. It has been used in numerous authorship attribution studies [25]- [27]. The number of characters or words that the window is shifted after each iteration is known as the window step or step length [27].…”
Section: B Sample Generationmentioning
confidence: 99%
See 1 more Smart Citation
“…In the case of sequential sampling, the sampling window is referred to as a sliding window due to the fact that it slides across the text. It has been used in numerous authorship attribution studies [25]- [27]. The number of characters or words that the window is shifted after each iteration is known as the window step or step length [27].…”
Section: B Sample Generationmentioning
confidence: 99%
“…It has been used in numerous authorship attribution studies [25]- [27]. The number of characters or words that the window is shifted after each iteration is known as the window step or step length [27]. Therefore, a window size of 1500 words and a step length of 1500 words means that the window will pass through the text with no overlap between samples.…”
Section: B Sample Generationmentioning
confidence: 99%
“…The pre-trained models like BERT, Embeddings from Language Models(ELMo), Universal Language Model Finetuning and generation Generative Pre-trained Transformer -2 www.ijacsa.thesai.org based authorship prediction on cross domain has been demonstrated in [17], a multi-headed classifier and DEMUX layer is created to handle different classifiers, BERT and ELMo outperform with more than 90% accuracy compared to other language models. Stylometry features play a vital role in the AA task, authors in [18] have explored a new technique of generating human-like sentences using a neural network and then various linguistic features are extracted to predict the authorship. the proposed model with an accuracy of 97.2% can predict the true author successfully.…”
Section: Literature Surveymentioning
confidence: 99%
“…Supervised authorship attribution traditionally refers to the task of analyzing the linguistic patterns of a text in order to determine who, from a finite set of enrolled authors, has written a document of unknown authorship. Nowadays, the focus of this closed-set scenario has shifted from literary to social media authorship attribution, where methods have been developed to deal with large-scaled datasets of smallsized online texts (Rocha et al, 2017), (Boenninghoff et al, 2019a), (Theophilo et al, 2019), (Boenninghoff et al, 2019b) (Tschuggnall et al, 2019).…”
mentioning
confidence: 99%