2019
DOI: 10.1109/access.2019.2944151
|View full text |Cite
|
Sign up to set email alerts
|

Word Similarity Datasets for Thai: Construction and Evaluation

Abstract: Distributional semantics in the form of word embeddings are an essential ingredient to many modern natural language processing systems. The quantification of semantic similarity between words can be used to evaluate the ability of a system to perform semantic interpretation. To this end, a number of word similarity datasets have been created for the English language over the last decades. For Thai language few such resources are available. In this work, we create three Thai word similarity datasets by translat… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
7
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
7

Relationship

0
7

Authors

Journals

citations
Cited by 10 publications
(7 citation statements)
references
References 27 publications
0
7
0
Order By: Relevance
“…From the above experiments, it can be found that the Chinese text algorithm proposed in this paper, which combines the advantages of the two similarity algorithms, performs better than the traditional algorithm in many cases. As can be seen from the above experimental data, the algorithm presented in this paper is superior to that of [7,8], in the accuracy of clustering experiments. is shows that the algorithm in this paper has been improved to some extent.…”
Section: Results Analysis and Discussionmentioning
confidence: 74%
See 1 more Smart Citation
“…From the above experiments, it can be found that the Chinese text algorithm proposed in this paper, which combines the advantages of the two similarity algorithms, performs better than the traditional algorithm in many cases. As can be seen from the above experimental data, the algorithm presented in this paper is superior to that of [7,8], in the accuracy of clustering experiments. is shows that the algorithm in this paper has been improved to some extent.…”
Section: Results Analysis and Discussionmentioning
confidence: 74%
“…However, there are large errors in the calculation of similarity of many English words under the influence of many factors, such as the update of English semantic dictionary, the improvement of the relationship between words, the influencing factors of similarity calculation model, and so on [6]. Netisopakul et al [7] created three ai word similarity datasets by translating and rerating the popular WordSim-353, SimLex-999, and SemEval-2017-Task-2 datasets. erefore, how to improve the estimation accuracy of English word similarity has become a hotspot in English semantic research.…”
Section: Introductionmentioning
confidence: 99%
“…A similar approach was followed by Ercan and Yıldız (2018) for Turkish, by Huang et al (2019) for Mandarin Chinese, and by Sakaizawa and Komachi (2018) for Japanese. Netisopakul, Wohlgenannt, and Pulich (2019) translated the concatenation of SimLex-999, WordSim-353, and the English SEMEVAL-500 into Thai and then reannotated it. Finally, Barzegar et al (2018) translated English SimLex-999 and WordSim-353 to 11 resource-rich target languages (German, French, Russian, Italian, Dutch, Chinese, Portuguese, Swedish, Spanish, Arabic, Farsi), but they did not provide details concerning the translation process and the resolution of translation disagreements.…”
Section: Previous Work and Evaluation Datamentioning
confidence: 99%
“…However, the prior papers were investigated in English texts. As well as Arabic and English [33], Thai and English have totally different syntactic writing (e.g., no space between any 2 Thai words [34][35][36]), the investigation in Thai texts should be totally categorized as another problem.…”
Section: Introductionmentioning
confidence: 99%
“…Since Thai-written language was a type of unsegment words, many state-of-the art papers were proposed to solve Thai word segmentation [34,35] by bi-LSTM [60], adversarial example [35], pre-training model [61] or unsupervised method with optimization [62]. The word segmentation had been still the main shortfall in Thai-NLP society [63] that totally affected the correctness of other Thai-NLP tasks [46,63], e.g., part of speech tagging, parsing, text classification, information extraction, semantic role labeling, machine translation, sentiment analysis, event extraction and question answering.…”
Section: Introductionmentioning
confidence: 99%