2020
DOI: 10.1017/s1351324920000479
|View full text |Cite
|
Sign up to set email alerts
|

A note on constituent parsing for Korean

Abstract: This study deals with widespread issues on constituent parsing for Korean including the quantitative and qualitative error analyses on parsing results. The previous treebank grammars have been accepted as being interpretable in the various annotation schemes, whereas the recent parsers turn out to be much harder for humans to interpret. This paper, therefore, intends to find the concrete typology of parsing errors, to describe how these parsers deal with sentences and to show their statistical distribution, us… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
12
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4

Relationship

4
0

Authors

Journals

citations
Cited by 4 publications
(12 citation statements)
references
References 31 publications
(44 reference statements)
0
12
0
Order By: Relevance
“…Morpheme-based segmentation for Korean has been proved beneficial. Many downstream applications for Korean language processing, such as POS tagging (Jung, Lee, and Hwang 2018;Park and Tyers 2019), phrase-structure parsing (Choi, Park, and Choi 2012;Park, Hong, and Cha 2016;Kim and Park 2022), and machine translation Cha, 2016, 2017b), are based on c https://github.com/KimByoungjae/klpNER2017 d https://github.com/naver/nlp-challenge/tree/master/missions/ner the morpheme-based segmentation, in which all morphemes are separated from each other. In these studies, the morpheme-based segmentation is implemented to avoid data sparsity because the number of possible words in longer segmentation granularity (such as eojeols) can be exponential given the characteristics of Korean, an agglutinative language.…”
Section: Previous Workmentioning
confidence: 99%
“…Morpheme-based segmentation for Korean has been proved beneficial. Many downstream applications for Korean language processing, such as POS tagging (Jung, Lee, and Hwang 2018;Park and Tyers 2019), phrase-structure parsing (Choi, Park, and Choi 2012;Park, Hong, and Cha 2016;Kim and Park 2022), and machine translation Cha, 2016, 2017b), are based on c https://github.com/KimByoungjae/klpNER2017 d https://github.com/naver/nlp-challenge/tree/master/missions/ner the morpheme-based segmentation, in which all morphemes are separated from each other. In these studies, the morpheme-based segmentation is implemented to avoid data sparsity because the number of possible words in longer segmentation granularity (such as eojeols) can be exponential given the characteristics of Korean, an agglutinative language.…”
Section: Previous Workmentioning
confidence: 99%
“…This corresponds also in some way to prosodic pattern because using a word as a analysis unit is the similar analogy to chunks where even simple parsing techniques can be effective (Abney, 1991). Secondly, as a by-product benefit by using a word, we may also reduce parsing errors such as word boundary and different label errors defined by Kim and Park (2022). The left side tree is for where the parser wrongly identifies the word boundary for the morphemeseparated word, for example, the parser can analyze the word as yeoleo bunyaeseo instead of yeoleo-bunya-eseo ('in several fields') 3 as shown in Figure 4.…”
Section: Discussion On Korean Categorial Grammarsmentioning
confidence: 98%
“…The left side tree is for where the parser wrongly identifies the word boundary for the morphemeseparated word, for example, the parser can analyze the word as yeoleo bunyaeseo instead of yeoleo-bunya-eseo ('in several fields') 3 as shown in Figure 4. They represent 20.61% of the entire parsing errors in the state of the art constituent parsing results (Kim and Park, 2022). These errors are mostly originated in the morpheme-separated word, and using the entire word as the analysis unit for categorial grammars can reduce such parsing errors.…”
Section: Discussion On Korean Categorial Grammarsmentioning
confidence: 99%
See 1 more Smart Citation
“…We note that a vp root sentence also may be a grammatically relevant sentence in Korean. We use the phrase-structure models described in Kim and Park (2022), which trained the Sejong treebank for Korean using the Berkeley neural parser (Kitaev, Cao, and Klein 2019) with the pre-training of deep bidirectional transformers (Devlin et al 2019). For syntactic complexity features, we add the distribution of grammatical morphemes such as the number of verbal endings and prepositions.…”
Section: Complexity Featuresmentioning
confidence: 99%