2013
DOI: 10.1109/tit.2012.2222343
|View full text |Cite
|
Sign up to set email alerts
|

Classification of Homogeneous Data With Large Alphabets

Abstract: Given training sequences generated by two distinct, but unknown, distributions sharing a common alphabet, we study the problem of determining whether a third test sequence was generated according to the first or second distribution using only the training data. To better model sources such as natural language, for which the underlying distributions are difficult to learn, we allow the alphabet size to grow and therefore the probability distributions to change with the blocklength. Our primary focus is the situ… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
20
0

Year Published

2013
2013
2025
2025

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 25 publications
(20 citation statements)
references
References 36 publications
0
20
0
Order By: Relevance
“…These results do not provide general tools to approximate the false alarm probability in the finite sample setting except in the case of uniform null disribution. Numerous other similar examples of asymptotically optimal hypothesis tests are found in literature (see, e.g., [3], [7]- [13]). Nevertheless, in a practical experiment involving such hypothesis tests, one has access only to a finite number of observations, and the metric of practical interest is the actual error probability with a finite number of samples rather than the error exponent or asymptotic consistency.…”
Section: Introductionmentioning
confidence: 65%
“…These results do not provide general tools to approximate the false alarm probability in the finite sample setting except in the case of uniform null disribution. Numerous other similar examples of asymptotically optimal hypothesis tests are found in literature (see, e.g., [3], [7]- [13]). Nevertheless, in a practical experiment involving such hypothesis tests, one has access only to a finite number of observations, and the metric of practical interest is the actual error probability with a finite number of samples rather than the error exponent or asymptotic consistency.…”
Section: Introductionmentioning
confidence: 65%
“…Substituting this into (39) leads to (40) When , we obtain Substituting this into (39) leads to (41) It follows from the bounds (40), (41) and that . Thus, the denominator of (39) satisfies Substituting this into (39) leads to Consequently, To obtain a refined approximation, let , which implies (42) An approximation for will be obtained: since , we have that the numerator and denominator in the summand of (39) satisfy Thus, Substituting this and (42) into (39) leads to which gives (43) The integration in (38) is now carried out along the closed contour given by :…”
Section: A) Approximation To the Logarithmic Moment Generating Functimentioning
confidence: 99%
“…To obtain tight bounds, we use a technique similar to the expurgating method in [40]. The distributions used in proving the bounds are constructed using the mixing of indistinguishable distributions method (see e.g., [5], [41]). …”
Section: Overview Of the Approachmentioning
confidence: 99%
See 2 more Smart Citations