Feng-Long Huang scite author profile

Data sparseness has been an inherited issue of statistical language models and smoothing method is usually used to resolve the zero count problems. In this paper, we studied empirically and analyzed the well-known smoothing methods of Good-Turing and advanced Good-Turing for language models on large sizes Chinese corpus. In the paper, ten models are generated sequentially on various size of corpus, from 30 M to 300 M Chinese words of CGW corpus. In our experiments, the smoothing methods; Good-Turing and Advanced Good-Turing smoothing are evaluated on inside testing and outside testing. Based on experiments results, we analyzed further the trends of perplexity of smoothing methods, which are useful for employing the effective smoothing methods to alleviate the issue of data sparseness on various sizes of language models. Finally, some helpful observations are described in detail.

show abstract

Disambiguation for polyphones of Chinese based on two-pass unified approach

Huang

Lin

2010

View full text Add to dashboard Cite

Disambiguating the senses of non-text symbols for Mandarin TTS systems with a three-layer classifier

Yu¹,

Huang²

2003

Speech Communication

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Feng-Long Huang

Predictable smart home system integrated with heterogeneous network and cloud computing

Intelligent and Disaster Prevention Hard Hat Based on AIOT and Speeches Recognition

An Empirical Study of Good-Turing Smoothing for Language Models on Different Size Corpora of Chinese

Disambiguation for polyphones of Chinese based on two-pass unified approach

Disambiguating the senses of non-text symbols for Mandarin TTS systems with a three-layer classifier

Contact Info

Product

Resources

About