Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2014
DOI: 10.3115/v1/p14-1127
|View full text |Cite
|
Sign up to set email alerts
|

Unsupervised Morphology-Based Vocabulary Expansion

Abstract: We present a novel way of generating unseen words, which is useful for certain applications such as automatic speech recognition or optical character recognition in low-resource languages. We test our vocabulary generator on seven low-resource languages by measuring the decrease in out-of-vocabulary word rate on a held-out test set. The languages we study have very different morphological properties; we show how our results differ depending on the morphological complexity of the language. In our best result (o… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
10
0

Year Published

2015
2015
2019
2019

Publication Types

Select...
4
1
1

Relationship

3
3

Authors

Journals

citations
Cited by 11 publications
(10 citation statements)
references
References 21 publications
(17 reference statements)
0
10
0
Order By: Relevance
“…We use the unsupervised morphological analyzer of Virpioja et al (2013), and obtain morpheme classes by running Morfessor FlatCat (Grönroos et al, 2014) on the output of the analyzer. We run the fixed-affix finite-state machine of (Rasooli et al, 2014) to obtain a single stem for all words including the out-of-vocabularies.…”
Section: Projection Experimentsmentioning
confidence: 99%
“…We use the unsupervised morphological analyzer of Virpioja et al (2013), and obtain morpheme classes by running Morfessor FlatCat (Grönroos et al, 2014) on the output of the analyzer. We run the fixed-affix finite-state machine of (Rasooli et al, 2014) to obtain a single stem for all words including the out-of-vocabularies.…”
Section: Projection Experimentsmentioning
confidence: 99%
“…We induced morphotactics and orthographic rules by using finite-state automata (FSAs) to build a morphological generator. Although morphological analyzer FSTs can be converted into morphological generators, to our knowledge, there is no specific work that focuses only on Turkish morphological generation by aiming to find all possible word forms as a lexicon expansion problem in a minimally supervised or unsupervised learning framework apart from the model proposed by Rasooli et al [15]. However, our algorithm deviates from their algorithm since we cluster both stems and suffixes to build the FSAs, whereas they learn the language model using the actual stems and suffixes without using their classes.…”
Section: Related Workmentioning
confidence: 99%
“…All the works mentioned above are supervised. Rasooli et al [15] introduced an unsupervised vocabulary generator for low-resource languages in order to reduce the out-of-vocabulary word rate in natural language processing applications, such as automatic speech recognition or optical character recognition. The model was tested on seven languages: Assamese, Bengali, Pashto, Persian, Tagalog, Turkish, and Zulu.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Compared to the above topics, the prediction of morphologically motivated unseen words is relatively little explored. Rasooli et al [18] have shown in a recent paper, how the segmentation produced by Morfessor can be used for generation of new words. Their approach is to generate all possible sequences of morphemes with a finite-state automaton and to apply additional reranking steps based on letter trigram probabilities.…”
Section: Lexicon Expansionmentioning
confidence: 99%