2010
DOI: 10.1080/02664760903093617
|View full text |Cite
|
Sign up to set email alerts
|

The Sichel model and the mixing and truncation order

Abstract: The analysis of word frequency count data can be very useful in authorship attribution problems. Zero-truncated generalized inverse Gaussian-Poisson mixture models are very helpful in the analysis of these kinds of data because their model-mixing density estimates can be used as estimates of the density of the word frequencies of the vocabulary. It is found that this model provides excellent fits for the word frequency counts of very long texts, where the truncated inverse Gaussian-Poisson special case fails b… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
6
0

Year Published

2012
2012
2013
2013

Publication Types

Select...
2
1

Relationship

1
2

Authors

Journals

citations
Cited by 3 publications
(6 citation statements)
references
References 20 publications
0
6
0
Order By: Relevance
“…. , n. (10) As discussed in Puig et al [4], the model mixing density ψ (π ) in Equation (10) represents the v n words that have appeared at least once in the given text of size n, and not all the v words in the vocabulary of the author. Hence here ψ (π ) heavily depends on the text size n and it cannot be interpreted as the density of vocabulary of the author in the way the mixing density ψ(π) associated with Equation (1) n (b, c).…”
Section: Bayesian Analysis Based On the Ig-truncated Poissonmentioning
confidence: 99%
See 1 more Smart Citation
“…. , n. (10) As discussed in Puig et al [4], the model mixing density ψ (π ) in Equation (10) represents the v n words that have appeared at least once in the given text of size n, and not all the v words in the vocabulary of the author. Hence here ψ (π ) heavily depends on the text size n and it cannot be interpreted as the density of vocabulary of the author in the way the mixing density ψ(π) associated with Equation (1) n (b, c).…”
Section: Bayesian Analysis Based On the Ig-truncated Poissonmentioning
confidence: 99%
“…Section 4 considers an alternative analysis based on the inverse Gaussian-truncated Poisson mixture model, first considered in Puig et al [4]. In Section 5, the two analysis are compared based on the posterior distribution of the sum of the squares of the Pearson errors and on the value taken by overall goodness-of-fit test statistics; even though the analysis in Section 4 based on the model that first truncates and then mixes is not as meaningful as the one in Section 3 based on the model that first mixes and then truncates, because it does not allow one to link the data with the distribution of the word frequencies of the vocabulary of the author, this alternative model fits some of the word frequency count data sets a bit more accurately than the usual inverse Gaussian-Poisson model.…”
Section: Introductionmentioning
confidence: 99%
“…Engen (1974), Ord and Whitmore (1986), Holmes (1992) and Baayen (2001), among many others, fit this type of data through the truncated versions of either the negative binomial or the IG-Poisson model. Sichel (1975, 97) and Puig et al (2010), following Good (1953), use a three parameter truncated generalized inverse gaussian-Poisson mixture model (GIG-Poisson), which includes the negative binomial and IG-Poisson models as special cases.…”
Section: Use Of the Ettp Model On Frequency Count Datamentioning
confidence: 99%
“…As discussed in Puig, Ginebra and Font (2010), the model mixing density ψ (π ) in (2.10) represents the v n words that have appeared at least once in the given text of size n, and not all the v words in the vocabulary of the author. Hence here ψ (π ) heavily depends The right panel of Figure 2.6 presents samples from the posterior distribution of (b, c) for the word frequency count data in Table 2.1, assuming that the likelihood function is proportional to: .11) and that the prior is such that b and c are independent Gamma (.001, .001).…”
Section: Bayesian Analysis Based On the Ig-truncated Poissonmentioning
confidence: 99%
“…Section 2.4 considers an alternative analysis based on the inverse gaussian-truncated Poisson mixture model, first considered in Puig, Ginebra and Font (2010). In Section 2.5 the two analysis are compared based on the posterior distribution of the sum of the squares of the Pearson errors and on the value taken by overall goodness of fit test statistics; even though the analysis in Section 2.4 based on the model that first truncates and then mixes is not as meaningful as the one in Section 2.3 based on the model that first mixes and then truncates, because it does not allow one to link the data with the distribution of the word frequencies of the vocabulary of the author, this alternative model fits some of the word frequency count data sets a bit more accurately than the usual inverse gaussian-Poisson model.…”
Section: Introductionmentioning
confidence: 99%