2001
DOI: 10.1162/089120101750300490
|View full text |Cite
|
Sign up to set email alerts
|

Unsupervised Learning of the Morphology of a Natural Language

Abstract: This study reports the results of using minimum description length (MDL) analysis to model unsupervised learning of the morphological segmentation of European languages, using corpora ranging in size from 5,000 words to 500,000 words. We develop a set of heuristics that rapidly develop a probabilistic morphological grammar, and use MDL as our primary tool to determine whether the modifications proposed by the heuristics will be adopted or not. The resulting grammar matches well the analysis that would be devel… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
389
0
6

Year Published

2002
2002
2015
2015

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 495 publications
(395 citation statements)
references
References 6 publications
0
389
0
6
Order By: Relevance
“…One facet of NL morphological structure commonly leveraged by morphology induction algorithms is that morphemes are recurrent building blocks of words. Brent et al (1995), Goldsmith (2001), and Creutz (2006) emphasize the building block nature of morphemes when they each use recurring word segments to efficiently encode a corpus. These approaches then hypothesize that those recurring segments which most efficiently encode a corpus are likely morphemes.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…One facet of NL morphological structure commonly leveraged by morphology induction algorithms is that morphemes are recurrent building blocks of words. Brent et al (1995), Goldsmith (2001), and Creutz (2006) emphasize the building block nature of morphemes when they each use recurring word segments to efficiently encode a corpus. These approaches then hypothesize that those recurring segments which most efficiently encode a corpus are likely morphemes.…”
Section: Related Workmentioning
confidence: 99%
“…The paradigm structure of NL morphology has also been previously leveraged. Goldsmith (2001) uses morphemes to efficiently encode a corpus, but he first groups morphemes into paradigm like structures he calls signatures. To date, the work that draws the most on paradigm structure is Snover (2002).…”
Section: Related Workmentioning
confidence: 99%
“…For the moment, for being able to perform a comparison with Linguistica system (J. Goldsmith, [3]), we made the experiments for suffixes only, though the algorithms permits simultaneous treatment of suffixes and prefixes.…”
Section: Resultsmentioning
confidence: 99%
“…The prevalent approach is presented in [3] and it is implemented in Linguistica system. Variations of this method are described in [4], [7], and [1].…”
Section: Introductionmentioning
confidence: 99%
“…2. Unsupervised Machine Learning Approach: Goldsmith (2000) developed an unsupervised learning automatic morphology tool called AutoMorphology. This system is advantageous because it could automatically learn the most common prefixes and suffixes from just a word-list.…”
Section: Arabic Morphologymentioning
confidence: 99%