A Survey of Thai Knowledge Extraction for the Semantic Web Research and Tools

Netisopakul, Ponrudee; Wohlgenannt, Gerhard

doi:10.1587/transinf.2017dar0001

Cited by 6 publications

(8 citation statements)

References 38 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Especially, analyzing Thai texts is made difficult by features. The Thai language is complex because it includes adjustable word order, serial verbs, and high incertitude in compound words [14]. It requires a high-performance word segmentation as it has no word boundary indicators to separate words in sentences.…”

Section: Text Preprocessingmentioning

confidence: 99%

“…The Thai sentences rely on individual judgment when there are unknown words, while English sentences have natural spaces. Besides, Thai has structural ambiguities in which Part-of-Speech (POS) tags relied on context words that affect one word may have the opposite meaning [8,14,15,16,17]. Thus, data preprocessing has involved the analysis of Thai sentences.…”

Section: Text Preprocessingmentioning

confidence: 99%

“…Their ex-perimental results showed that combining the features could improve sentiment analysis in Thai texts. [14] suggested the methods to increase the performance of the Thai sentiment classification with points to add weighing scheme for several POS, add disambiguate word senses, add negation into the process, and improve the Thai sentiment resource. This paper was inspired by authors in [16,20,21] as more compacts set of syntactic and lexical features combine studied grammatical pattern features for objective to affect the performance of question classification.…”

Section: Feature Selectionmentioning

confidence: 99%

See 2 more Smart Citations

Part-of-Speech tagging enhancement to natural language processing for Thai wh-question classification with deep learning

Chotirat¹,

Meesad²

2021

Heliyon

View full text Add to dashboard Cite

Question classification is a crucial task for answer selection. Question classification could help define the structure of question sentences generated by features extraction from a sentence, such as who, when, where, and how. In this paper, we proposed a methodology to improve question classification from texts by using feature selection and word embedding techniques. We conducted several experiments to evaluate the performance of the proposed methodology using two different datasets (TREC-6 dataset and Thai sentence dataset) with term frequency and combined term frequency-inverse document frequency including Unigram, Unigram+Bigram, and Unigram + Trigram as features. Machine learning models based on traditional and deep learning classifiers were used. The traditional classification models were Multinomial Naïve Bayes, Logistic Regression, and Support Vector Machine. The deep learning techniques were Bidirectional Long Short-Term Memory (BiLSTM), Convolutional Neural Networks (CNN), and Hybrid model, which combined CNN and BiLSTM model. The experiment results showed that our methodology based on Part-of-Speech (POS) tagging was the best to improve question classification accuracy. The classifying question categories achieved with average micro 𝐹 1 -score of 0.98 when applied SVM model on adding all POS tags in the TREC-6 dataset. The highest average micro 𝐹 1 -score achieved 0.8 when applied GloVe by using CNN model on adding focusing tags in the Thai sentences dataset.

show abstract

Section: Text Preprocessingmentioning

confidence: 99%

Section: Text Preprocessingmentioning

confidence: 99%

Section: Feature Selectionmentioning

confidence: 99%

See 1 more Smart Citation

Part-of-Speech tagging enhancement to natural language processing for Thai wh-question classification with deep learning

Chotirat¹,

Meesad²

2021

Heliyon

View full text Add to dashboard Cite

show abstract

“…The authors of the previous evaluations reported a number of difficulties with the pretrained models, first of all, a high number of out-of-vocabulary (OOV) terms [ 18 ]. The problem is related to the peculiarities of Thai language, which were discussed at length in Netisopakul and Wohlgenannt [ 20 ]. In brief, firstly, written Thai language, like some other Asian languages (e.g.…”

Section: Introductionmentioning

confidence: 99%

Improving the state-of-the-art in Thai semantic similarity using distributional semantics and ontological information

et al. 2021

Self Cite

View full text Add to dashboard Cite

Research into semantic similarity has a long history in lexical semantics, and it has applications in many natural language processing (NLP) tasks like word sense disambiguation or machine translation. The task of calculating semantic similarity is usually presented in the form of datasets which contain word pairs and a human-assigned similarity score. Algorithms are then evaluated by their ability to approximate the gold standard similarity scores. Many such datasets, with different characteristics, have been created for English language. Recently, four of those were transformed to Thai language versions, namely WordSim-353, SimLex-999, SemEval-2017-500, and R&G-65. Given those four datasets, in this work we aim to improve the previous baseline evaluations for Thai semantic similarity and solve challenges of unsegmented Asian languages (particularly the high fraction of out-of-vocabulary (OOV) dataset terms). To this end we apply and integrate different strategies to compute similarity, including traditional word-level embeddings, subword-unit embeddings, and ontological or hybrid sources like WordNet and ConceptNet. With our best model, which combines self-trained fastText subword embeddings with ConceptNet Numberbatch, we managed to raise the state-of-the-art, measured with the harmonic mean of Pearson on Spearman ρ, by a large margin from 0.356 to 0.688 for TH-WordSim-353, from 0.286 to 0.769 for TH-SemEval-500, from 0.397 to 0.717 for TH-SimLex-999, and from 0.505 to 0.901 for TWS-65.

show abstract

“…Compound vowels can be constructed in various ways by combining vowel characters and consonants, and be placed above, below, before or after the consonants. Further complications in Thai NLP include zero anaphora, the absence of upper/lower-case characters, high ambiguity of compound words, and serial verbs [12]. In contrast to English, Thai is a tonal language with five different tones, making it a very difficult language for English native speakers to comprehend.…”

Section: Introductionmentioning

confidence: 99%

Word Similarity Datasets for Thai: Construction and Evaluation

2019

Self Cite

View full text Add to dashboard Cite

Distributional semantics in the form of word embeddings are an essential ingredient to many modern natural language processing systems. The quantification of semantic similarity between words can be used to evaluate the ability of a system to perform semantic interpretation. To this end, a number of word similarity datasets have been created for the English language over the last decades. For Thai language few such resources are available. In this work, we create three Thai word similarity datasets by translating and re-rating the popular WordSim-353, SimLex-999 and SemEval-2017-Task-2 datasets. The three datasets contain 1852 word pairs in total and have different characteristics in terms of difficulty, domain coverage, and notion of similarity (relatedness vs. similarity). These features help to gain a broader picture of the properties of an evaluated word embedding model. We include baseline evaluations with existing Thai embedding models, and identify the high ratio of out-of-vocabulary words as one of the biggest challenges in the evaluation process. All datasets, evaluation results, and a tool for easy evaluation of new Thai embedding models are available to the NLP community online.

show abstract

A Survey of Thai Knowledge Extraction for the Semantic Web Research and Tools

Cited by 6 publications

References 38 publications

Part-of-Speech tagging enhancement to natural language processing for Thai wh-question classification with deep learning

Part-of-Speech tagging enhancement to natural language processing for Thai wh-question classification with deep learning

Improving the state-of-the-art in Thai semantic similarity using distributional semantics and ontological information

Word Similarity Datasets for Thai: Construction and Evaluation

Contact Info

Product

Resources

About