2015
DOI: 10.1007/978-3-319-18111-0_12
|View full text |Cite
|
Sign up to set email alerts
|

Data-Driven Morphological Analysis and Disambiguation for Kazakh

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
9
0

Year Published

2016
2016
2023
2023

Publication Types

Select...
4
2
1

Relationship

2
5

Authors

Journals

citations
Cited by 12 publications
(9 citation statements)
references
References 12 publications
0
9
0
Order By: Relevance
“…MC segments are represented as binary vectors that, for a given analysis, encode presence or absence of each morpheme found in the train set. This ensures language independence and contrasts previous work (at least on Turkish and Kazakh), where only certain morphemes are chosen as features depending on their position (Assylbekov et al, 2016;Hakkani-Tür et al, 2002) or presence (Makhambetov et al, 2015) in an analysis, or the authors' intuition (Yildiz et al, 2016;Tolegen et al, 2016;Sak et al, 2007).…”
Section: Introductionmentioning
confidence: 86%
See 1 more Smart Citation
“…MC segments are represented as binary vectors that, for a given analysis, encode presence or absence of each morpheme found in the train set. This ensures language independence and contrasts previous work (at least on Turkish and Kazakh), where only certain morphemes are chosen as features depending on their position (Assylbekov et al, 2016;Hakkani-Tür et al, 2002) or presence (Makhambetov et al, 2015) in an analysis, or the authors' intuition (Yildiz et al, 2016;Tolegen et al, 2016;Sak et al, 2007).…”
Section: Introductionmentioning
confidence: 86%
“…Although several statistical models have been proposed for Kazakh MD, such as HMM- (Makazhanov et al, 2014;Makhambetov et al, 2015;Assylbekov et al, 2016), voted perceptron- (Tolegen et al, 2016) and transformation-based (Kessikbayeva and Cicekli, 2016) taggers, to our knowledge ours is the first deep learning-based approach to the problem that is also purely language independent.…”
Section: Related Workmentioning
confidence: 99%
“…This is not the same task as the one we are exploring, where the objective is to return the complete set of possible analyses. Similar in spirit is the work on Kazakh morphological analysis by Makhambetov et al (2015). Their system, based on Hidden Markov Models, returns a subset of the analyses of a token which could plausibly occur in a given context.…”
Section: Related Workmentioning
confidence: 97%
“…Because dictionary entries are lemmatized, during cleaning we perform lemmatization on both source and target sides of the training set, and later restore the target side of the cleaned data. For target side lemmatization we use a data-driven morphological disambiguator for Kazakh [10]. We implement the models using the Moses toolkit [29], setting the distortion limit parameter to -1 (infinity) to account for long range dependencies and free word order of the languages.…”
Section: Experiments and Evaluationmentioning
confidence: 99%
“…From technical perspective, there is another challenge that concerns mostly Kazakh in its lack of resources for our particular purposes. By and large the language is being actively studied, and there exist monolingual corpora [6,7], and ongoing research on morphological processing [8][9][10][11][12][13] and syntactic parsing [14][15][16]. However, except for a rather small and noisy OPUS corpus [17] there are no Russian-Kazakh parallel corpora 4 and the only tool for automatic morphological disambiguation of Kazakh available to us 5 was reported to have accuracy of 86%, which we considered to be low enough to question the results of experiments with segmentation: would possible misalignments be shortcomings of a chosen segmentation scheme or results of incorrect morphological analysis and disambiguation.…”
Section: Introductionmentioning
confidence: 99%