2013
DOI: 10.4236/jcc.2013.15003
|View full text |Cite
|
Sign up to set email alerts
|

An Empirical Study of Good-Turing Smoothing for Language Models on Different Size Corpora of Chinese

Abstract: Data sparseness has been an inherited issue of statistical language models and smoothing method is usually used to resolve the zero count problems. In this paper, we studied empirically and analyzed the well-known smoothing methods of Good-Turing and advanced Good-Turing for language models on large sizes Chinese corpus. In the paper, ten models are generated sequentially on various size of corpus, from 30 M to 300 M Chinese words of CGW corpus. In our experiments, the smoothing methods; Good-Turing and Advanc… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2015
2015
2021
2021

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(3 citation statements)
references
References 7 publications
0
3
0
Order By: Relevance
“…We intend to develop 5- or more-gram models, and expand the corpus to simulate the assembly process more reasonably. Other smoothing technologies such as Good-Turing Smoothing, Katz backoff, Interpolation Smoothing ( 20 , 21 ) will be considered to improve the mathematical model. Some parts are overwhelmingly likely to be returned in any analysis.…”
Section: Discussionmentioning
confidence: 99%
“…We intend to develop 5- or more-gram models, and expand the corpus to simulate the assembly process more reasonably. Other smoothing technologies such as Good-Turing Smoothing, Katz backoff, Interpolation Smoothing ( 20 , 21 ) will be considered to improve the mathematical model. Some parts are overwhelmingly likely to be returned in any analysis.…”
Section: Discussionmentioning
confidence: 99%
“…A new scheme LDA-KN is formed by integration of these two technique for efficient smoothing of LM which overcomes the problem of data sparseness [17] in the state of the art. This model results in better smoothing hence, leads to generalized LM.…”
Section: Proposed Methodology: Lda-kn Smoothingmentioning
confidence: 99%
“…A number of smoothing algorithms for LM has been investigated. In the literature, a number of smoothing techniques can be seen including Additive smoothing [8], Good-Turing [17], Jelinek-Mercer and Katz smoothing [8], Witten-Bell smoothing [8], Absolute Discounting [13], Kneser Ney [1] and Latent Dirichlet Allocation (LDA) [3]. As per literature survey.…”
Section: Introductionmentioning
confidence: 99%