2006 IEEE International Symposium on Information Theory 2006
DOI: 10.1109/isit.2006.262066
|View full text |Cite
|
Sign up to set email alerts
|

Strong Consistency of the Good-Turing Estimator

Abstract: Abstract-We consider the problem of estimating the total probability of all symbols that appear with a given frequency in a string of i.i.d. random variables with unknown distribution. We focus on the regime in which the block length is large yet no symbol appears frequently in the string. This is accomplished by allowing the distribution to change with the block length. Under a natural convergence assumption on the sequence of underlying distributions, we show that the total probabilities converge to a determ… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
15
0

Year Published

2007
2007
2017
2017

Publication Types

Select...
5
3

Relationship

2
6

Authors

Journals

citations
Cited by 19 publications
(15 citation statements)
references
References 16 publications
0
15
0
Order By: Relevance
“…Various properties of the Good-Turing estimator and several variations of it have been analyzed for distribution estimation and compression [9], [10], [11], [12], [13], [14], [15]. Several concentration results on missing mass estimation are also known [16], [17].…”
Section: A Good-turing Estimator and Previous Resultsmentioning
confidence: 99%
“…Various properties of the Good-Turing estimator and several variations of it have been analyzed for distribution estimation and compression [9], [10], [11], [12], [13], [14], [15]. Several concentration results on missing mass estimation are also known [16], [17].…”
Section: A Good-turing Estimator and Previous Resultsmentioning
confidence: 99%
“…It should be mentioned that Proposition 1 above, and Proposition 7 and Theorem 1, which appear later, do not require the assumption thatč ≤ np n (a) ≤ĉ for all a and n [30]. Our proofs of the other results in this paper do rely on this assumption, however.…”
Section: A Important Limitsmentioning
confidence: 93%
“…The complement of this quantity, C + = 1 − C 0 , represents the proportion of observed classes and is often called the sample coverage in the literature. On theoretical aspects of the method, Esty (1983) and Zhang and Zhang (2009) obtained conditions for asymptotic normality of the sample coverage estimator, Orlitsky et al (2003) addressed an optimal property based on information theory, and McAllester and Schapire (2000) and Wagner et al (2006) established several consistency properties. As mentioned in Good (1953Good ( , 2000, the method was motivated by the founder of modern computer science and Good's mentor, Dr. Alan Turing, so that it is generally called the Good-Turing frequency estimation method.…”
Section: Introductionmentioning
confidence: 99%
“…The Good-Turing method has been applied successfully in several disciplines, such as information retrieval (Song and Croft, 1999), computational linguistics (Church and Hanks, 1990), speech recognition (Jelinek, 1998;Chen and Goodman, 1999), species richness estimation (Esty, 1985;, population size estimation , Shannon entropy estimation , and missile coverage estimation (Lo, 1992). On theoretical aspects of the method, Esty (1983) and Zhang and Zhang (2009) obtained conditions for asymptotic normality of the sample coverage estimator, Orlitsky et al (2003) addressed an optimal property based on information theory, and McAllester and Schapire (2000) and Wagner et al (2006) established several consistency properties. The research literature surrounding the Good-Turing method topic is rich; however, existing studies focus on sampling with replacement, which is equivalent to sampling from an infinite population or from a finite population when the sampling fraction is negligible.…”
Section: Introductionmentioning
confidence: 99%