Proceedings of the Main Conference on Human Language Technology Conference of the North American Chapter of the Association of 2006
DOI: 10.3115/1220835.1220897
|View full text |Cite
|
Sign up to set email alerts
|

Unlimited vocabulary speech recognition for agglutinative languages

Abstract: It is practically impossible to build a word-based lexicon for speech recognition in agglutinative languages that would cover all the relevant words. The problem is that words are generally built by concatenating several prefixes and suffixes to the word roots. Together with compounding and inflections this leads to millions of different, but still frequent word forms. Due to inflections, ambiguity and other phenomena, it is also not trivial to automatically split the words into meaningful parts. Rule-based mo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
45
1
1

Year Published

2009
2009
2020
2020

Publication Types

Select...
6
4

Relationship

2
8

Authors

Journals

citations
Cited by 59 publications
(47 citation statements)
references
References 17 publications
0
45
1
1
Order By: Relevance
“…The baseline algorithm has been found to be very useful in automatic speech recognition of agglutinative languages (Kurimo et al, 2006). However, it often oversegments morphemes that are rare or not seen at all in the training data.…”
Section: Finnish-to-english Translationmentioning
confidence: 99%
“…The baseline algorithm has been found to be very useful in automatic speech recognition of agglutinative languages (Kurimo et al, 2006). However, it often oversegments morphemes that are rare or not seen at all in the training data.…”
Section: Finnish-to-english Translationmentioning
confidence: 99%
“…Segmentation of words, clitics, and affixes is essential for a number of natural language processing (NLP) applications, including machine translation, parsing, and speech recognition (Chang et al, 2008;Tsarfaty, 2006;Kurimo et al, 2006). Segmentation is a common practice in Arabic NLP due to the language's morphological richness.…”
Section: Introductionmentioning
confidence: 99%
“…The Morfessor Baseline model has been a popular method for segmenting Finnish, Estonian and other agglutinative languages for speech recognition [11,12]. In this work, we use the Morfessor 2.0 implementation [13].…”
Section: Morfessormentioning
confidence: 99%