A note on constituent parsing for Korean

Kim, Mija; Park, Jungyeul

doi:10.1017/s1351324920000479

Cited by 4 publications

(12 citation statements)

References 31 publications

(44 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Morpheme-based segmentation for Korean has been proved beneficial. Many downstream applications for Korean language processing, such as POS tagging (Jung, Lee, and Hwang 2018;Park and Tyers 2019), phrase-structure parsing (Choi, Park, and Choi 2012;Park, Hong, and Cha 2016;Kim and Park 2022), and machine translation Cha, 2016, 2017b), are based on c https://github.com/KimByoungjae/klpNER2017 d https://github.com/naver/nlp-challenge/tree/master/missions/ner the morpheme-based segmentation, in which all morphemes are separated from each other. In these studies, the morpheme-based segmentation is implemented to avoid data sparsity because the number of possible words in longer segmentation granularity (such as eojeols) can be exponential given the characteristics of Korean, an agglutinative language.…”

Section: Previous Workmentioning

confidence: 99%

Korean named entity recognition based on language-specific features

2023

Self Cite

View full text Add to dashboard Cite

In this paper, we propose a novel way of improving named entity recognition (NER) in the Korean language using its language-specific features. While the field of NER has been studied extensively in recent years, the mechanism of efficiently recognizing named entities (NEs) in Korean has hardly been explored. This is because the Korean language has distinct linguistic properties that present challenges for modeling. Therefore, an annotation scheme for Korean corpora by adopting the CoNLL-U format, which decomposes Korean words into morphemes and reduces the ambiguity of NEs in the original segmentation that may contain functional morphemes such as postpositions and particles, is proposed herein. We investigate how the NE tags are best represented in this morpheme-based scheme and implement an algorithm to convert word-based and syllable-based Korean corpora with NEs into the proposed morpheme-based format. Analyses of the results of traditional and neural models reveal that the proposed morpheme-based format is feasible, and the varied performances of the models under the influence of various additional language-specific features are demonstrated. Extrinsic conditions were also considered to observe the variance of the performances of the proposed models, given different types of data, including the original segmentation and different types of tagging formats.

show abstract

Section: Previous Workmentioning

confidence: 99%

Korean named entity recognition based on language-specific features

2023

Self Cite

View full text Add to dashboard Cite

show abstract

“…This corresponds also in some way to prosodic pattern because using a word as a analysis unit is the similar analogy to chunks where even simple parsing techniques can be effective (Abney, 1991). Secondly, as a by-product benefit by using a word, we may also reduce parsing errors such as word boundary and different label errors defined by Kim and Park (2022). The left side tree is for where the parser wrongly identifies the word boundary for the morphemeseparated word, for example, the parser can analyze the word as yeoleo bunyaeseo instead of yeoleo-bunya-eseo ('in several fields') 3 as shown in Figure 4.…”

Section: Discussion On Korean Categorial Grammarsmentioning

confidence: 98%

“…The left side tree is for where the parser wrongly identifies the word boundary for the morphemeseparated word, for example, the parser can analyze the word as yeoleo bunyaeseo instead of yeoleo-bunya-eseo ('in several fields') 3 as shown in Figure 4. They represent 20.61% of the entire parsing errors in the state of the art constituent parsing results (Kim and Park, 2022). These errors are mostly originated in the morpheme-separated word, and using the entire word as the analysis unit for categorial grammars can reduce such parsing errors.…”

Section: Discussion On Korean Categorial Grammarsmentioning

confidence: 99%

“…Cha's CCG parser extends their category to all morphemes based on their morphological analyzer for Korean (Cha, Lee, and Lee, 1998). This is quite natural because morpheme based parsing instead of word (or eojeol) based parsing, has been a de facto standard for constituent-based parsing of Korean (Kim and Park, 2022). For instance sae jib deul eul instead of saejibdeul eul ('new houses acc'), is used for constituent parsing.…”

Section: Multiset-ccg Of Koreanmentioning

confidence: 99%

See 1 more Smart Citation

A role of functional morphemes in Korean categorial grammars

Park

Kim

2023

Self Cite

View full text Add to dashboard Cite

This study discusses a role of functional morphemes in Korean categorial grammars, providing the reviews of various types of Korean categorial grammars that have never been conducted so far, notwithstanding many previous studies on them. Previous work has presented different morphological segmentation because of Korean’s agglutinative characteristics, implying that Korean words may contain a different segmentation sequence of morphemes. We focus on functional morphemes in Korean categorial grammars, which have been explored in different ways by previous work. We present detailed analyses for postpositions and verbal endings in categorial grammars, insisting that the functional morphemes in Korean should be treated as part of a word, with the result that their categories do not require to be assigned individually in a syntactic level, and also that it would be more efficient to assign the syntactic categories on the fully inflected lexical word derived by the lexical rule of the morphological processes in the lexicon.

show abstract

“…We note that a vp root sentence also may be a grammatically relevant sentence in Korean. We use the phrase-structure models described in Kim and Park (2022), which trained the Sejong treebank for Korean using the Berkeley neural parser (Kitaev, Cao, and Klein 2019) with the pre-training of deep bidirectional transformers (Devlin et al 2019). For syntactic complexity features, we add the distribution of grammatical morphemes such as the number of verbal endings and prepositions.…”

Section: Complexity Featuresmentioning

confidence: 99%

Neural automated writing evaluation for Korean L2 writing

2022

Self Cite

View full text Add to dashboard Cite

Although Korean language education is experiencing rapid growth in recent years and several studies have investigated automated writing evaluation (AWE) systems, AWE for Korean L2 writing still remains unexplored. Therefore, this study aims to develop and validate a state-of-the-art neural model AWE system which can be widely used for Korean language teaching and learning. Based on a Korean learner corpus, the proposed AWE is developed using natural language processing techniques such as part-of-speech tagging, syntactic parsing, and statistical language modeling to engineer linguistic features and a pre-trained neural language model. This study attempted to determine how neural network models use different linguistic features to improve AWE performance. Experimental results of the proposed AWE tool showed that the neural AWE system achieves high reliability for unseen test data from the corpus, which implies metrics used in the AWE system can help differentiate different proficiency levels and predict holistic scores. Furthermore, the results confirmed that the proposed linguistic features–syntactic complexity, quantitative complexity, and fluency–offer benefits that complement neural automated writing evaluation.

show abstract

A note on constituent parsing for Korean

Cited by 4 publications

References 31 publications

Korean named entity recognition based on language-specific features

Korean named entity recognition based on language-specific features

A role of functional morphemes in Korean categorial grammars

Neural automated writing evaluation for Korean L2 writing

Contact Info

Product

Resources

About