A Robust Named-Entity Recognition System Using Syllable Bigram Embedding with Eojeol Prefix Information

Kwon, Sunjae; Seo, Jungyun

doi:10.1145/3132847.3133105

Cited by 4 publications

(4 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Although the corpus does not aim at medical NER, we expect that this extra experiment would help to justify the effectiveness of BERT for Korean NER. We choose bi-LSTM-CRF as a benchmark model because the model has achieved state-of-the-art performance in Korean NER [20,22]. The architecture of bi-LSTM-CRF is shown in Fig.…”

Section: Resultsmentioning

confidence: 99%

“…However, because of the linguistic property that the words are not always clearly separated, the tokenization and input encoding influence a lot to the final performance in general. The character-level n-gram encoding with additional linguistic information is one of the state-of-the-art approaches for Korean NER [ 20 ]. A recent work reports that jamo (Korean alphabet) level representation extracts well the word semantics in terms of word similarity [ 21 ].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Korean clinical entity recognition from diagnosis text using BERT

Kim

Lee

2020

BMC Med Inform Decis Mak

View full text Add to dashboard Cite

Background While clinical entity recognition mostly aims at electronic health records (EHRs), there are also the demands of dealing with the other type of text data. Automatic medical diagnosis is an example of new applications using a different data source. In this work, we are interested in extracting Korean clinical entities from a new medical dataset, which is completely different from EHRs. The dataset is collected from an online QA site for medical diagnosis. Bidirectional Encoder Representations from Transformers (BERT), which is one of the best language representation models, is used to extract the entities. Results A slightly modified version of BERT labeling strategy replaces the original labeling to enhance the separation of postpositions in Korean. A new clinical entity recognition dataset that we construct, as well as a standard NER dataset, have been used for the experiments. A pre-trained multilingual BERT model is used for the initialization of the entity recognition model. BERT significantly outperforms a character-level bidirectional LSTM-CRF, a benchmark model, in terms of all metrics. The micro-averaged precision, recall, and f1 of BERT are 0.83, 0.85 and 0.84, whereas that of bi-LSTM-CRF are 0.82, 0.79 and 0.81 respectively. The recall values of BERT are especially better than that of the other model. It can be interpreted that the trained BERT model could detect out of vocabulary (OOV) words better than bi-LSTM-CRF. Conclusions The recently developed BERT and its WordPiece tokenization are effective for the Korean clinical entity recognition. The experiments using a new dataset constructed for the purpose and a standard NER dataset show the superiority of BERT compared to a state-of-the-art method. To the best of our knowledge, this work is one of the first studies dealing with clinical entity extraction from non-EHR data.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Korean clinical entity recognition from diagnosis text using BERT

Kim

Lee

2020

BMC Med Inform Decis Mak

View full text Add to dashboard Cite

show abstract

“…Lee et al (2020) explored both the syllable level and sub-character level representations of the text, achieving similar results to multilingual BERT with 1/10 of the training data. Kwon et al (2017) proposed a deep learning based NER system that operates over syllables rather than words, resulting in a speedup by removing the need for morphological analysis. Kim et al (2021) resentations and also found that syllables were the most effective representation for Korean NER.…”

Section: Related Workmentioning

confidence: 99%

UA-KO at SemEval-2022 Task 11: Data Augmentation and Ensembles for Korean Named Entity Recognition

Song¹,

Bethard²

2022

Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

View full text Add to dashboard Cite

show abstract

“…Recently, ML-based systems that implement well-known supervised learning models have been developed to improve the accuracy of NER systems. These models include: Decision Trees (DT) [4], Maximum Entropy Models (MEM) [5], Conditional Random Fields (CRF) [6,7], structural Support Vector Machines (SVM) [8], and recent neural network models based on Long-Short Term Memory (LSTM) with a CRF layer [11][12][13].…”

Section: Previous Workmentioning

confidence: 99%

Low-Cost Implementation of a Named Entity Recognition System for Voice-Activated Human-Appliance Interfaces in a Smart Home

Park

Kim

2018

Sustainability

View full text Add to dashboard Cite

When we develop voice-activated human-appliance interface systems in smart homes, named entity recognition (NER) is an essential tool for extracting execution targets from natural language commands. Previous studies on NER systems generally include supervised machine-learning methods that require a substantial amount of human-annotated training corpus. In the smart home environment, categories of named entities should be defined according to voice-activated devices (e.g., food names for refrigerators and song titles for music players). The previous machine-learning methods make it difficult to change categories of named entities because a large amount of the training corpus should be newly constructed by hand. To address this problem, we present a semi-supervised NER system to minimize the time-consuming and labor-intensive task of constructing the training corpus. Our system uses distant supervision methods with two kinds of auto-labeling processes: auto-labeling based on heuristic rules for single-class named entity corpus generation and auto-labeling based on a pre-trained single-class NER model for multi-class named entity corpus generation. Then, our system improves NER accuracy by using a bagging-based active learning method. In our experiments that included a generic domain that featured 11 named entity classes and a context-specific domain about baseball that featured 21 named entity classes, our system demonstrated good performances in both domains, with F1-measures of 0.777 and 0.958, respectively. Since our system was built from a relatively small human-annotated training corpus, we believe it is a viable alternative to current NER systems in smart home environments.

show abstract

A Robust Named-Entity Recognition System Using Syllable Bigram Embedding with Eojeol Prefix Information

Cited by 4 publications

References 10 publications

Korean clinical entity recognition from diagnosis text using BERT

Korean clinical entity recognition from diagnosis text using BERT

UA-KO at SemEval-2022 Task 11: Data Augmentation and Ensembles for Korean Named Entity Recognition

Low-Cost Implementation of a Named Entity Recognition System for Voice-Activated Human-Appliance Interfaces in a Smart Home

Contact Info

Product

Resources

About