Proceedings of the Fifth Workshop on South and Southeast Asian Natural Language Processing 2014
DOI: 10.3115/v1/w14-5503
|View full text |Cite
|
Sign up to set email alerts
|

Integrating Dictionaries into an Unsupervised Model for Myanmar Word Segmentation

Abstract: This paper addresses the problem of word segmentation for low resource languages, with the main focus being on Myanmar language. In our proposed method, we focus on exploiting limited amounts of dictionary resource, in an attempt to improve the segmentation quality of an unsupervised word segmenter. Three models are proposed. In the first, a set of dictionaries (separate dictionaries for different classes of words) are directly introduced into the generative model. In the second, a language model was built fro… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 15 publications
0
1
0
Order By: Relevance
“…We normalized Romanian data and removed diacritics following previous work [51]. Low-resource Setting We used the ALT multi-way parallel dataset [58]. We used English and 6 Asian languages: Filipino (Fil), Indonesian (Id), Japanese (Ja), Malay (Ms), Vietnamese (Vi), and simplified Chinese (Zh).…”
Section: Datasetsmentioning
confidence: 99%
“…We normalized Romanian data and removed diacritics following previous work [51]. Low-resource Setting We used the ALT multi-way parallel dataset [58]. We used English and 6 Asian languages: Filipino (Fil), Indonesian (Id), Japanese (Ja), Malay (Ms), Vietnamese (Vi), and simplified Chinese (Zh).…”
Section: Datasetsmentioning
confidence: 99%