1995
DOI: 10.1007/bf01830395
|View full text |Cite
|
Sign up to set email alerts
|

On the utility of content analysis in author attribution:The Federalist

Abstract: In studies of author attribution, measurement of differential use of function words is the most common procedure, though lexical statistics are often used. Content analysis has seldom been employed. We compare the success of lexical statistics, content analysis, and function words in classifying the 12 disputed Federalist papers. Of course, Mosteller and Wallace (1964) have presented overwhelming evidence that all 12 were by James Madison rather than by Alexander Hamilton. Our purpose is not to challenge these… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
22
0

Year Published

1999
1999
2016
2016

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 51 publications
(22 citation statements)
references
References 21 publications
0
22
0
Order By: Relevance
“…For example, one author may prefer to use the words start and large, where another may prefer begin and big (Mosteller andWallace 1964, Koppel et al 2006a). Such patterns of lexical choice can be represented by modeling the relative frequencies of content words (Martindale and McKenzie 1995;Craig 1999;Waugh et al 2000;Diederich et al 2003;Hoover 2004aHoover , 2004bArgamon et al 2008). Typically very rare words and those with near-uniform distribution over the corpus of interest can be omitted (Forman 2003), so that a reasonable set of perhaps several thousand words may used.…”
Section: Content Wordsmentioning
confidence: 99%
“…For example, one author may prefer to use the words start and large, where another may prefer begin and big (Mosteller andWallace 1964, Koppel et al 2006a). Such patterns of lexical choice can be represented by modeling the relative frequencies of content words (Martindale and McKenzie 1995;Craig 1999;Waugh et al 2000;Diederich et al 2003;Hoover 2004aHoover , 2004bArgamon et al 2008). Typically very rare words and those with near-uniform distribution over the corpus of interest can be omitted (Forman 2003), so that a reasonable set of perhaps several thousand words may used.…”
Section: Content Wordsmentioning
confidence: 99%
“…Studies include [58,99,122,142]; Rudman [127] lists no less than nineteen studies of this particular corpus and is hardly complete. Perhaps needless to say, almost all of these studies confirm this particular assignment of authorship and the correctness of Mosteller and Wallace's results.…”
Section: The Federalist Analysesmentioning
confidence: 99%
“…Mosteller andWallace (1964, 1984) carried extensive comparisons of the frequencies of a carefully chosen set of common words in writings known to be by Hamilton and by Madison, with the frequencies of these words in the twelve disputed papers. Recent studies re-visiting that problem are, for example, Holmes and Forsyth (1995), Martindale and McKenzie (1995), Tweedie et al (1996), Bosch and Smith (1998), Khmelev and Tweedie (2001), Collins et al (2004), and Jockers and Witten (2010).…”
Section: Authorship Attribution Case Studymentioning
confidence: 99%