Proceedings of the 16th Workshop on Computational Research in Phonetics, Phonology, and Morphology 2019
DOI: 10.18653/v1/w19-4214
|View full text |Cite
|
Sign up to set email alerts
|

A Little Linguistics Goes a Long Way: Unsupervised Segmentation with Limited Language Specific Guidance

Abstract: We present de-lexical segmentation, a linguistically motivated alternative to greedy or other unsupervised methods, requiring language specific knowledge, but no direct supervision. Our technique involves creating a small grammar of closed-class affixes which can be written in a few hours. The grammar over generates analyses for word forms attested in a raw corpus which are disambiguated based on features of the linguistic base proposed for each form. Extending the grammar to cover orthographic, morphosyntacti… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
4
2

Relationship

3
3

Authors

Journals

citations
Cited by 6 publications
(4 citation statements)
references
References 40 publications
0
4
0
Order By: Relevance
“…Their approach paved the way for many computational approaches based on morphologically tagged corpora or lexicons. However, "The complexity of morphological inflection is only a small bit of the larger question of morphological typology" (Cotterell et al 2019: 339), and deriving morphologically tagged corpora from the kind of raw texts we address in this paper is still an open issue, despite recent progress (Erdmann et al 2019;Malouf 2017).…”
Section: [4]mentioning
confidence: 99%
“…Their approach paved the way for many computational approaches based on morphologically tagged corpora or lexicons. However, "The complexity of morphological inflection is only a small bit of the larger question of morphological typology" (Cotterell et al 2019: 339), and deriving morphologically tagged corpora from the kind of raw texts we address in this paper is still an open issue, despite recent progress (Erdmann et al 2019;Malouf 2017).…”
Section: [4]mentioning
confidence: 99%
“…However, many researchers worked on integrating semantics in the learning of morphology (Schone and Jurafsky 2000;Narasimhan et al 2015), especially with the advances in neural network-based distributional semantics (Narasimhan et al 2015). Most recently, Erdmann et al (2019) presented a linguistically motivated alternative to greedy or other unsupervised methods, requiring only minimal language-specific input with large unannotated corpora. In their evaluations, they consistently outperform competitive unsupervised baselines and approach the performance of state-of-the-art models such as MADAMIRA (Pasha et al 2014) and Farasa (Abdelali et al 2016).…”
Section: Unsupervised Learning Approaches To Morphological Segmentationmentioning
confidence: 99%
“…For example, if π contains words wxyxz and axx, b π is xx and the exponents are (<w, y, z>) and (<a), respectively. 7 Inspired by unsupervised maximum matching in greedy tokenization (Guo, 1997;Erdmann et al, 2019), we define the following paradigm score function:…”
Section: Clustering Into Paradigmsmentioning
confidence: 99%