2006
DOI: 10.1002/asi.20428
|View full text |Cite
|
Sign up to set email alerts
|

Feature instability as a criterion for selecting potential style markers

Abstract: We introduce a new measure on linguistic features, called stability, which captures the extent to which a language element, such as a word or a syntactic construct, is replaceable by semantically equivalent elements.This measure may be perceived as quantifying the degree of available "synonymy" for a language item. We show that frequent but unstable features are especially useful as discriminators of an author's writing style. IntroductionOften we wish to find linguistic markers that distinguish the writing st… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
32
0

Year Published

2008
2008
2013
2013

Publication Types

Select...
4
1
1

Relationship

2
4

Authors

Journals

citations
Cited by 55 publications
(32 citation statements)
references
References 10 publications
0
32
0
Order By: Relevance
“…Many studies since that of Mosteller and Wallace have shown the efficacy of function words for authorship attribution in different scenarios (Morton 1978;Burrows 1987;Karlgren & Cutting 1994;Merriam & Matthews 1994;Kessler et al 1997;Argamon et al 1998;Holmes 1998;de Vel et al 2001;Holmes et al 2001aHolmes et al , 2001bBaayen et al 2002;Binongo 2003;Juola & Baayen 2003;Zhao & Zobel 2005;Argamon & Levitan 2005;Koppel et al 2005Koppel et al , 2006a, confirming the hypothesis that different authors tend to have different characteristic patterns of function word use.…”
Section: Function Wordsmentioning
confidence: 82%
See 2 more Smart Citations
“…Many studies since that of Mosteller and Wallace have shown the efficacy of function words for authorship attribution in different scenarios (Morton 1978;Burrows 1987;Karlgren & Cutting 1994;Merriam & Matthews 1994;Kessler et al 1997;Argamon et al 1998;Holmes 1998;de Vel et al 2001;Holmes et al 2001aHolmes et al , 2001bBaayen et al 2002;Binongo 2003;Juola & Baayen 2003;Zhao & Zobel 2005;Argamon & Levitan 2005;Koppel et al 2005Koppel et al , 2006a, confirming the hypothesis that different authors tend to have different characteristic patterns of function word use.…”
Section: Function Wordsmentioning
confidence: 82%
“…More recently, Graham et al (2005) and Zheng et al (2006) used neural networks on a wide variety of features. Other studies used k-nearest neighbor (Kjell et al 1995;Hoorn et al 1999;Zhao & Zobel 2005), Naive Bayes (Kjell 1994a;Hoorn et al 1999;Peng et al 2004), rule learners (Holmes & Forsyth 1995;Holmes 1998;Argamon et al 1998;Koppel & Schler 2003;Abbasi & Chen 2005;Zheng et al 2006), support vector machines (De Vel et al 2001;Diederich et al 2003;Koppel & Schler 2003, Abbasi & Chen 2005Koppel et al 2005;Zheng et al 2006), Winnow (Koppel et al 2002;Argamon et al 2003;Koppel et al 2006a), and Bayesian regression Madigan et al 2006;Argamon et al 2008). Further details regarding these studies can be found in the Appendix.…”
Section: Machine Learning Approachmentioning
confidence: 99%
See 1 more Smart Citation
“…The comparison of this method with information gain, a well-known feature selection algorithm examining the discriminatory power of features individually (Forman, 2003), showed that the frequencybased feature set was more accurate for feature sets comprising up to 4,000 features. Similarly, Koppel, Akiva, and Dagan (2006) presented experiments comparing frequencybased feature selection with odds-ratio, another typical feature selection algorithm using discrimination information (Forman, 2003). More important, the frequency information they used was not extracted from the training corpus.…”
Section: Feature Selection and Extractionmentioning
confidence: 99%
“…As it has been demonstrated by several authorship identification studies, the frequency of features is a crucial factor for their significance [14,19]. Actually, the frequency information is more important than the discriminatory power of the features when examined individually.…”
Section: Feature Relevancementioning
confidence: 97%