2013
DOI: 10.1007/978-3-642-41491-6_29
|View full text |Cite
|
Sign up to set email alerts
|

A New Word Language Model Evaluation Metric for Character Based Languages

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2014
2014
2024
2024

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 9 publications
(4 citation statements)
references
References 5 publications
0
4
0
Order By: Relevance
“…For character-based languages (e.g. Chinese), CER is commonly used instead of WER as the measure for OCR, and, thus, we report only the CER [66]. These error values clearly state that Character Degradation is the effect that affects the transcription of the documents the most.…”
Section: Daniel-sysmentioning
confidence: 99%
“…For character-based languages (e.g. Chinese), CER is commonly used instead of WER as the measure for OCR, and, thus, we report only the CER [66]. These error values clearly state that Character Degradation is the effect that affects the transcription of the documents the most.…”
Section: Daniel-sysmentioning
confidence: 99%
“…The developed language model is measured using a perplexity score since it will be used in an ASR system. As described in [20], [21], perplexity is a commonly used metric to measure the performance of a word-based language model applied in an ASR model. This metric has two advantages.…”
Section: Collapsed Gibbs Samplingmentioning
confidence: 99%
“…Firstly, it is calculated independently with no real ASR. It is categorized as an intrinsic evaluation that is much simpler than an extrinsic one by evaluating the language model on the real ASR model [20]. Secondly, it has a high correlation with word error rate (WER) in an ASR, especially when the models are trained using the same training set of data.…”
Section: Collapsed Gibbs Samplingmentioning
confidence: 99%
“…The Chinese part of the corpus is segmented into words before LM training. Maximum matching word segmentation is used with a large word vocabulary V extracted from web data provided by (Wang et al, 2013b). The pinyin part is segmented according to the Chinese part.…”
Section: Corpora Tools and Experiments Settingsmentioning
confidence: 99%