2009
DOI: 10.1007/978-3-642-01307-2_10
|View full text |Cite
|
Sign up to set email alerts
|

Thai Word Segmentation with Hidden Markov Model and Decision Tree

Abstract: Abstract. The Thai written language is one of the languages that does not have word boundaries. In order to discover the meaning of the document, all texts must be separated into syllables, words, sentences, and paragraphs. This paper develops a novel method to segment the Thai text by combining a nondictionary based technique with a dictionary-based technique. This method first applies the Thai language grammar rules to the text for identifying syllables. The hidden Markov model is then used for merging possi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
3
0

Year Published

2011
2011
2023
2023

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 11 publications
(4 citation statements)
references
References 5 publications
0
3
0
Order By: Relevance
“…The first three are essentially sequence tagging models that involve assigning a label or tag to each element in the sequence of input data. For sequence tagging tasks, NLP researchers use several supervised machine learning algorithms, such as HMM (Hidden Markov Model) [27], [28], RNN (Recurrent Neural Networks) [29], [30], [31], [32], [33], and CRF (Conditional Random Fields) [34], [35], [36], [37]. For various sequence tagging tasks, such as Word Segmentation and POS tagging, the CRF usually outperforms the other models.…”
Section: Modelsmentioning
confidence: 99%
“…The first three are essentially sequence tagging models that involve assigning a label or tag to each element in the sequence of input data. For sequence tagging tasks, NLP researchers use several supervised machine learning algorithms, such as HMM (Hidden Markov Model) [27], [28], RNN (Recurrent Neural Networks) [29], [30], [31], [32], [33], and CRF (Conditional Random Fields) [34], [35], [36], [37]. For various sequence tagging tasks, such as Word Segmentation and POS tagging, the CRF usually outperforms the other models.…”
Section: Modelsmentioning
confidence: 99%
“…Using TCCs, (Theeramunkong and Usanavasin, 2001) develop a decision tree classifier to determine whether a word should be formed from TCCs, based on a predefined metric. (Aroonmanakun, 2002) presents a twostage word segmentation that incorporates handcrafted syllable features with dy namic programming to form the most reasonable segmentation, while (Bheganan et al, 2009) use a hid den Markov model to form words that are then verified with a dictionary. Modern approaches to this problem are used by practitioners, but it is unclear which approach is the most accurate or fastest.…”
Section: Related Workmentioning
confidence: 99%
“…They can be grouped into two categories which could be dictionary based or non-dictionary based. Dictionary based approaches include techniques such as longest matching [11] [12] [13] [14], maximum matching [12] [13] [14] and decision tree [15]. Non-dictionary based include rule-based [16], Hidden Markov Model (HMM) [15] [17] and Native Bayesian [14].…”
Section: Background On Thai Word Segmentationmentioning
confidence: 99%