DOI: 10.1007/978-3-540-88282-4_23
|View full text |Cite
|
Sign up to set email alerts
|

A Hybrid Approach to Word Segmentation of Vietnamese Texts

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
29
0
2

Publication Types

Select...
6
2
2

Relationship

0
10

Authors

Journals

citations
Cited by 92 publications
(31 citation statements)
references
References 5 publications
0
29
0
2
Order By: Relevance
“…For word segmentation, we used vnTokenizer, a highly accurate segmenter which uses a hybrid approach to automatically tokenize Vietnamese text. This approach combines a finite-state automata technique, regular expression parsing, and a maximal-matching strategy augmented by statistical methods that resolve ambiguities of segmentation Phuong et al (2008). We also used JVnTagger, a POS tagger based on Conditional Random Fields Lafferty et al (2001) and Maximum Entropy Berger et al (1996).…”
Section: Toolsmentioning
confidence: 99%
“…For word segmentation, we used vnTokenizer, a highly accurate segmenter which uses a hybrid approach to automatically tokenize Vietnamese text. This approach combines a finite-state automata technique, regular expression parsing, and a maximal-matching strategy augmented by statistical methods that resolve ambiguities of segmentation Phuong et al (2008). We also used JVnTagger, a POS tagger based on Conditional Random Fields Lafferty et al (2001) and Maximum Entropy Berger et al (1996).…”
Section: Toolsmentioning
confidence: 99%
“…To segment a Vietnamese sentence into clauses, the system first tokens the sentences by VnTagger [6]. Then it segments Vietnamese sentences into clauses using conjunctions and segmentation rules.…”
Section: [Chị Làm Bài Tập Văn] [Em Làm Bài Tập Toán] [Sister Does Litmentioning
confidence: 99%
“…In order to perform word segmentation and POS tagging for normalized tweets, we employ vnTokenizer 3 of [10] for word segmentation and VnTagger 4 of [11] for POS tagging.…”
Section: Word Segmentation and Pos Taggingmentioning
confidence: 99%