2007
DOI: 10.1145/1322391.1322394
|View full text |Cite
|
Sign up to set email alerts
|

Morph-based speech recognition and modeling of out-of-vocabulary words across languages

Abstract: We explore the use of morph-based language models in large-vocabulary continuous-speech recognition systems across four so-called morphologically rich languages: Finnish, Estonian, Turkish, and Egyptian Colloquial Arabic. The morphs are subword units discovered in an unsupervised, data-driven way using the Morfessor algorithm. By estimating n-gram language models over sequences of morphs instead of words, the quality of the language model is improved through better vocabulary coverage and reduced data sparsity… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
92
1
1

Year Published

2009
2009
2022
2022

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 99 publications
(94 citation statements)
references
References 25 publications
0
92
1
1
Order By: Relevance
“…To the best of our knowledge, this has not been tried before in the domain of text-entry. In the related domain of speech recognition, similar approaches have yielded good results [2] for agglutinative languages.…”
Section: Related Approaches To Text Entrymentioning
confidence: 99%
See 2 more Smart Citations
“…To the best of our knowledge, this has not been tried before in the domain of text-entry. In the related domain of speech recognition, similar approaches have yielded good results [2] for agglutinative languages.…”
Section: Related Approaches To Text Entrymentioning
confidence: 99%
“…The task of the morph model is to assign a probability for each sequence of morphs in M(K) for the key sequence K. The probability of a sequence of s morphs (L 1 , ..., L s ) ∈ M(K) is given by the chain rule of probabilities in equation (2).…”
Section: A Markov Chain Of Morphsmentioning
confidence: 99%
See 1 more Smart Citation
“…Hammarström and Borin (2011) provide a detailed survey of the topic. Morfessor (Creutz and Lagus, 2002;Creutz and others, 2006;Creutz et al, 2007) based on Minimum Description Length principle is the reference model for highly inflecting languages, such as Finnish. Morfessor defines a model of lexicon and tries to find an optimum lexicon model using heuristic search procedure to achieve morphological segmentation.…”
Section: Related Workmentioning
confidence: 99%
“…Creutz [2] decomposes words into sub-word units called morphemes. The experiments were run for highly-inflecting languages, i.e.…”
Section: Introductionmentioning
confidence: 99%