2006
DOI: 10.1017/s1351324905004055
|View full text |Cite
|
Sign up to set email alerts
|

An algorithm for the unsupervised learning of morphology

Abstract: This paper describes in detail an algorithm for the unsupervised learning of natural language morphology, with emphasis on challenges that are encountered in languages typologically similar to European languages. It utilizes the Minimum Description Length analysis described in Goldsmith (2001), and has been implemented in software that is available for downloading and testing.2 There is no natural home in the analysis presented in this paper for the distinction between inflectional and derivational morphology.… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
78
0
1

Year Published

2010
2010
2018
2018

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 98 publications
(79 citation statements)
references
References 14 publications
0
78
0
1
Order By: Relevance
“…In other words, is Grammar a set of structures or a set of mechanisms for learning such structures? This question has been approached with a variety of evidence; the point of this paper is to provide computational corpus-based evidence by simulating the languagelearning process with computational models (e.g., Goldsmith, 2001Goldsmith, , 2006Solan, Horn, Ruppin, & Edelman, 2005; as opposed to the approach taken in Briscoe, 2000). If a grammar-induction algorithm is capable of learning the grammar of a language without innate structure and using purely statistical properties of observed language data, then it follows that such grammar learning is possible in principle given only linguistic input.…”
Section: Dunnmentioning
confidence: 99%
See 1 more Smart Citation
“…In other words, is Grammar a set of structures or a set of mechanisms for learning such structures? This question has been approached with a variety of evidence; the point of this paper is to provide computational corpus-based evidence by simulating the languagelearning process with computational models (e.g., Goldsmith, 2001Goldsmith, , 2006Solan, Horn, Ruppin, & Edelman, 2005; as opposed to the approach taken in Briscoe, 2000). If a grammar-induction algorithm is capable of learning the grammar of a language without innate structure and using purely statistical properties of observed language data, then it follows that such grammar learning is possible in principle given only linguistic input.…”
Section: Dunnmentioning
confidence: 99%
“…Mutual information (MI: i.e., association strength) is used to filter out redundant or nested candidates, and the MI threshold is determined using minimum description length to evaluate possible grammars (cf. Goldsmith, 2006). Klein and Manning (2002) take yet another approach to finding constituents, starting with all possible subsequences of part-of-speech tags within the same sentence as the candidate set, considering only those candidates which produce binary trees.…”
Section: O D E L I N G C O N St R U C T I O N Smentioning
confidence: 99%
“…Chan (2008: ch. 3) provides a clear demonstration of the effects of sparsity on models that require full paradigms to infer a reasonable representation of the morphology of the language. When the Wall Street Journal portion of the Penn Treebank (Marcus et al 1999) is provided as input to the Linguistica system (Goldsmith 2006), a minimum description length (MDL) morphology learning system that attempts to derive the most compact characterization of the data, it learns a set of signatures, structures which contain a set of stems and the suffixes they take. For example, it identifies 604 stems (e.g., alarm) that take the set of suffixes {-∅, -ed, -ing, -s}.…”
Section: The Distributional Learning Of Morphologymentioning
confidence: 99%
“…Linguistica algorithm (Goldsmith, 2006) and Morfessor (Creutz and Lagus, 2004) represent word forms using sets of substrings and they utilise criteria such as the minimum description length (MDL). The end results of such processes are sets of strings which are similar to linguistic morphs (but not always the same).…”
Section: Past Workmentioning
confidence: 99%