2017
DOI: 10.1017/s1351324917000249
|View full text |Cite
|
Sign up to set email alerts
|

Syntactic methods for topic-independent authorship attribution

Abstract: The efficacy of syntactic features for topic-independent authorship attribution is evaluated, taking a feature set of frequencies of words and punctuation marks as baseline. The features are ‘deep’ in the sense that they are derived by parsing the subject texts, in contrast to ‘shallow’ syntactic features for which a part-of-speech analysis is enough. The experiments are made on two corpora of online texts and one corpus of novels written around the year 1900. The classification tasks include classical closed-… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
6
1

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
2

Relationship

1
5

Authors

Journals

citations
Cited by 6 publications
(7 citation statements)
references
References 17 publications
0
6
1
Order By: Relevance
“…The difference between the analyses based on word features and POS features seems negligible, so these experiments did not reproduce the findings of our previous study on English novels (Björklund and Zechner, 2017). Looking at the results without Strindberg, the gap in accuracy between on the one hand the classic test (the first case in Table 1) and on the other hand the topic-controlled test (the second case) is 70% for words and 68% for POStechnically a better result for the POS method, but hardly compelling evidence of a difference.…”
Section: Comparison Of Methodscontrasting
confidence: 73%
See 2 more Smart Citations
“…The difference between the analyses based on word features and POS features seems negligible, so these experiments did not reproduce the findings of our previous study on English novels (Björklund and Zechner, 2017). Looking at the results without Strindberg, the gap in accuracy between on the one hand the classic test (the first case in Table 1) and on the other hand the topic-controlled test (the second case) is 70% for words and 68% for POStechnically a better result for the POS method, but hardly compelling evidence of a difference.…”
Section: Comparison Of Methodscontrasting
confidence: 73%
“…In a previous study (Björklund and Zechner, 2017), we investigated this problem by examining a set of novels, using each separate novel as an approximation of topic. In this study, we begin to expand on that work and apply a similar approach to a larger corpus, this time in Swedish.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Recently, cross-genre, cross-domain, cross-topic and multi-topic data sets [14,20] have been studied in AA tasks. Most studies on topic and domain issues have focused on cross-domain or cross-topic tasks [2123]. Topic influence has been addressed through generative models where function words and stylometric markers have been used in joint inference of the author and the topic [22].…”
Section: Related Workmentioning
confidence: 99%
“…On the other hand, studies on removing topic-related information through masking document content has been successfully applied in multi-topic data sets [21]. In another study, a syntactic feature set has been suggested for topic-independent AA [23]. In general, approaches in cross-domain and cross-genre tasks of AA have focused on comparing sets of features and classification approaches in data sets where authors write in several domains or genres [16,23,24].…”
Section: Related Workmentioning
confidence: 99%