2013
DOI: 10.1002/asi.22954
|View full text |Cite
|
Sign up to set email alerts
|

Determining if two documents are written by the same author

Abstract: Almost any conceivable authorship attribution problem can be reduced to one fundamental problem: whether a pair of (possibly short) documents were written by the same author. In this article, we offer an (almost) unsupervised method for solving this problem with surprisingly high accuracy. The main idea is to use repeated feature subsampling methods to determine if one document of the pair allows us to select the other from among a background set of "impostors" in a sufficiently robust manner.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
184
0
3

Year Published

2014
2014
2017
2017

Publication Types

Select...
5
1
1

Relationship

1
6

Authors

Journals

citations
Cited by 169 publications
(188 citation statements)
references
References 25 publications
1
184
0
3
Order By: Relevance
“…Similar to PAN-2013, the overall winner was a modification of the Impostors method [26]. The performance of this approach was notably stable on all six corpus subsets.…”
Section: Evaluation Resultsmentioning
confidence: 99%
See 3 more Smart Citations
“…Similar to PAN-2013, the overall winner was a modification of the Impostors method [26]. The performance of this approach was notably stable on all six corpus subsets.…”
Section: Evaluation Resultsmentioning
confidence: 99%
“…Compared to the closed-set attribution scenario, this setting is much more difficult, especially if the size of the candidate author set is small [24]. Finally, if the set of candidate authors is singleton, we get the author verification problem, which is a fundamental problem in authorship attribution since any problem setting can be decomposed into a series of verification problems [26].…”
Section: Author Identificationmentioning
confidence: 99%
See 2 more Smart Citations
“…A number of diverse applications of text classification were reported in literature, ranging from subject categorization [3], analysis of sentiment of reviews or opinions, to authorship recognition of documents [4], [8], [9], etc.…”
Section: Introductionmentioning
confidence: 99%