2019
DOI: 10.3390/info10100317
|View full text |Cite
|
Sign up to set email alerts
|

MiNgMatch—A Fast N-gram Model for Word Segmentation of the Ainu Language

Abstract: Word segmentation is an essential task in automatic language processing for languages where there are no explicit word boundary markers, or where space-delimited orthographic words are too coarse-grained. In this paper we introduce the MiNgMatch Segmenter-a fast word segmentation algorithm, which reduces the problem of identifying word boundaries to finding the shortest sequence of lexical n-grams matching the input text. In order to validate our method in a low-resource scenario involving extremely sparse dat… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
2
1
1

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(2 citation statements)
references
References 27 publications
0
2
0
Order By: Relevance
“…The exploration of advanced computational methods like NLP and MT in the context of language revival is increasingly gaining traction. The pioneering work of Nowakowski et al [2019] introduced an efficient n-gram model for the segmentation of Ainu words, demonstrating the capabilities of computational techniques to enhance the study and accessibility of the Ainu language. Following this, Nowakowski [2020] took further strides by developing a digital corpus alongside fundamental language technologies for Ainu.…”
Section: Nlp For Ainumentioning
confidence: 99%
See 1 more Smart Citation
“…The exploration of advanced computational methods like NLP and MT in the context of language revival is increasingly gaining traction. The pioneering work of Nowakowski et al [2019] introduced an efficient n-gram model for the segmentation of Ainu words, demonstrating the capabilities of computational techniques to enhance the study and accessibility of the Ainu language. Following this, Nowakowski [2020] took further strides by developing a digital corpus alongside fundamental language technologies for Ainu.…”
Section: Nlp For Ainumentioning
confidence: 99%
“…Foundational work by researchers like Nowakowski et al [2019], who developed the Mingmatch-a n-gram model for segmenting Ainu words, and Matsuura et al [2020b], who compiled an Ainu folklore speech corpus, provides critical underpinnings for this study. Building upon these seminal contributions, this project seeks to construct a comprehensive NLP model that utilizes the Marian MT as its core for translating between Ainu and Japanese.…”
Section: Introductionmentioning
confidence: 99%