2015
DOI: 10.1007/978-3-319-25816-4_17
|View full text |Cite
|
Sign up to set email alerts
|

Learning Distributed Representations of Uyghur Words and Morphemes

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(4 citation statements)
references
References 3 publications
0
4
0
Order By: Relevance
“…Due to the agglutinative nature of Uyghur and Kazakh, theoretically, an infinite vocabulary can be generated [7]. As a result, data sparsity in agglutinative languages poses a challenge for downstream NLP tasks, as even small datasets lead to a large vocabulary [5].…”
Section: ‫نىڭكى‬mentioning
confidence: 99%
See 2 more Smart Citations
“…Due to the agglutinative nature of Uyghur and Kazakh, theoretically, an infinite vocabulary can be generated [7]. As a result, data sparsity in agglutinative languages poses a challenge for downstream NLP tasks, as even small datasets lead to a large vocabulary [5].…”
Section: ‫نىڭكى‬mentioning
confidence: 99%
“…In statistical-based stemming or morphological segmentation tasks for Uyghur and Kazakh languages, features such as syllables [31], part-of-speech, context [19,32,33], phonetic classes, the presence of sound change phenomena, and phonetic features [34] are often selected and added to the model to improve its performance. In deep learning-based models, (Bi)RNN [35], BiLSTM-CRF [36], CNN-BiLSTM-CRF [7], pointer networks [37], and attention mechanism [7,37,38] have been used to learn the labels of the input sequence and distinguish morpheme boundaries. The literature mentioned above have introduced labeling schemes, but these labels are not independent, which can easily lead to model overfitting.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…From the above discussion, we may state that Turkish NLP studies has to deal with language processing tasks before modelling a solution to the target problem. In general, most-words are composed of many morphemes and they may occur only once on the training data that generates the so called data-sparsity and curse of dimensionality problems [42,43] from computational modelling point of view. It is important to observe that this complexity constrains implementation of state-ofthe-art models and algorithms developed for example for English.…”
Section: Turkish Language Modelling Challenges Based On Its Morphological Complexitymentioning
confidence: 99%