2021
DOI: 10.2196/29667
|View full text |Cite
|
Sign up to set email alerts
|

A Word Pair Dataset for Semantic Similarity and Relatedness in Korean Medical Vocabulary: Reference Development and Validation

Abstract: Background The fact that medical terms require special expertise and are becoming increasingly complex makes it difficult to employ natural language processing techniques in medical informatics. Several human-validated reference standards for medical terms have been developed to evaluate word embedding models using the semantic similarity and relatedness of medical word pairs. However, there are very few reference standards in non-English languages. In addition, because the existing reference stand… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
2

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(3 citation statements)
references
References 21 publications
0
3
0
Order By: Relevance
“…Health information news articles were selected from well-known newspapers targeting common readers with no expert knowledge. Medical textbooks were selected at an intermediate level because they introduce expert knowledge to medical students, and the documents were well-structured and of very high quality 17 . (1) Medical textbooks: Two Korean publishing companies provided textbooks for this study.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Health information news articles were selected from well-known newspapers targeting common readers with no expert knowledge. Medical textbooks were selected at an intermediate level because they introduce expert knowledge to medical students, and the documents were well-structured and of very high quality 17 . (1) Medical textbooks: Two Korean publishing companies provided textbooks for this study.…”
Section: Methodsmentioning
confidence: 99%
“…In addition, 1629 words were added from the similarity and relatedness experiments and merged into the extended vocabulary. The details of the similarity and relatedness experiments are described in the previous work 17 .…”
Section: Methodsmentioning
confidence: 99%
“…Researchers have been inspired by the original BERT architecture to create many variations (eg, RoBERTa, DistilRoBERTa, DistilBERT, and BART, etc) that have surpassed the benchmarks of previous models. Moreover, these models can be fine-tuned for specific domain-based tasks (ClinicalBERT and BioBERT) in multiple languages [11,12,25]. Furthermore, several studies have used other fine-tuned BERT models to investigate COVID-19-related content expressed on social media related to misinformation detection, sentiment classification, and continent analysis [13,[26][27][28][29].…”
Section: Bert Algorithmmentioning
confidence: 99%