2020
DOI: 10.1007/978-3-030-58323-1_7
|View full text |Cite
|
Sign up to set email alerts
|

Quantitative Analysis of the Morphological Complexity of Malayalam Language

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
2

Relationship

2
4

Authors

Journals

citations
Cited by 6 publications
(5 citation statements)
references
References 7 publications
0
5
0
Order By: Relevance
“…Table 1 gives examples of a few complex word formations in Malayalam. It has been demonstrated in the literature that the Malayalam language exhibits a high level of morphological complexity than many other Indian and European languages in terms of type-token ratio and type-token growth rate [5,6].…”
Section: Morphological Complexity Of Malayalam Languagementioning
confidence: 99%
See 1 more Smart Citation
“…Table 1 gives examples of a few complex word formations in Malayalam. It has been demonstrated in the literature that the Malayalam language exhibits a high level of morphological complexity than many other Indian and European languages in terms of type-token ratio and type-token growth rate [5,6].…”
Section: Morphological Complexity Of Malayalam Languagementioning
confidence: 99%
“…BPE ensures that the most common words are represented in the pronunciation dictionary as a single token while the rare words are broken down into two or more subword tokens [20]. BPE tokenization algorithm available in subword-nmt Python library is used in the experiments described in this work 6 . The time complexity in tokenizing a word of length M using BPE implementation in [20] is O(M 2 ) 7 [33].…”
Section: Bpe Tokensmentioning
confidence: 99%
“…Morphologically complex languages with very large number of rare words are challenging for machine translation and ASR tasks due to huge out of vocabulary (OOV) rate. Malayalam language is known to demonstrate a high level of morphological complexity than many other Indian and European languages in terms of type-token ratio and typetoken growth rate [14], [15]. For languages with very little transcribed audio datasets available for speech related tasks, a precise grapheme to phoneme conversion can ensure better acoustic modeling, even in end-to-end [16] ASR systems.…”
Section: Motivationmentioning
confidence: 99%
“…Agricultural speech and text corpora for Malayalam with 4k manually transcribed phonetic lexicon entries has been reported by Lekshmi et al [28]. Considering the agglutinative nature of Malayalam language and its practically infinite vocabulary, a manually curated, small sized pronunciation lexicon would be inadequate for general domain speech tasks [14]. Also there could be need for expanding the vocabulary of lexicon as new words get added to the language in the form of proper nouns and loan words.…”
Section: Motivationmentioning
confidence: 99%
“…Prior to the emergence of subword segmenters, translation systems were plagued with the issue of out-of-vocabulary (OOV) tokens. This was particularly an issue for translations involving agglutinative languages such as Turkish (Ataman and Federico, 2018b) or Malayalam (Manohar, Jayan, and Rajan, 2020). Various segmentation algorithms were brought forward to circumvent this issue and in turn, improve translation quality.…”
Section: Subword Segmentation Techniquesmentioning
confidence: 99%