An algorithm for the unsupervised learning of morphology

Goldsmith, John

doi:10.1017/s1351324905004055

Cited by 98 publications

(79 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…In other words, is Grammar a set of structures or a set of mechanisms for learning such structures? This question has been approached with a variety of evidence; the point of this paper is to provide computational corpus-based evidence by simulating the languagelearning process with computational models (e.g., Goldsmith, 2001Goldsmith, , 2006Solan, Horn, Ruppin, & Edelman, 2005; as opposed to the approach taken in Briscoe, 2000). If a grammar-induction algorithm is capable of learning the grammar of a language without innate structure and using purely statistical properties of observed language data, then it follows that such grammar learning is possible in principle given only linguistic input.…”

Section: Dunnmentioning

confidence: 99%

“…Mutual information (MI: i.e., association strength) is used to filter out redundant or nested candidates, and the MI threshold is determined using minimum description length to evaluate possible grammars (cf. Goldsmith, 2006). Klein and Manning (2002) take yet another approach to finding constituents, starting with all possible subsequences of part-of-speech tags within the same sentence as the candidate set, considering only those candidates which produce binary trees.…”

Section: O D E L I N G C O N St R U C T I O N Smentioning

confidence: 99%

See 1 more Smart Citation

Computational learning of construction grammars

Dunn

2016

Lang. cogn.

View full text Add to dashboard Cite

a b st r a c t This paper presents an algorithm for learning the construction grammar of a language from a large corpus. This grammar induction algorithm has two goals: first, to show that construction grammars are learnable without highly specified innate structure; second, to develop a model of which units do or do not constitute constructions in a given dataset. The basic task of construction grammar induction is to identify the minimum set of constructions that represents the language in question with maximum descriptive adequacy. These constructions must (1) generalize across an unspecified number of units while (2) containing mixed levels of representation internally (e.g., both item-specific and schematized representations), and (3) allowing for unfilled and partially filled slots. Additionally, these constructions may (4) contain recursive structure within a given slot that needs to be reduced in order to produce a sufficiently schematic representation. In other words, these constructions are multi-length, multi-level, possibly discontinuous co-occurrences which generalize across internal recursive structures. These co-occurrences are modeled using frequency and the ΔP measure of association, expanded in novel ways to cover multi-unit sequences. This work provides important new evidence for the learnability of construction grammars as well as a tool for the automated corpus analysis of constructions.k e y w o r d s : construction grammar, grammar induction, multi-unit association measures, poverty of the stimulus.

show abstract

Section: Dunnmentioning

confidence: 99%

Section: O D E L I N G C O N St R U C T I O N Smentioning

confidence: 99%

Computational learning of construction grammars

Dunn

2016

Lang. cogn.

View full text Add to dashboard Cite

show abstract

“…Chan (2008: ch. 3) provides a clear demonstration of the effects of sparsity on models that require full paradigms to infer a reasonable representation of the morphology of the language. When the Wall Street Journal portion of the Penn Treebank (Marcus et al 1999) is provided as input to the Linguistica system (Goldsmith 2006), a minimum description length (MDL) morphology learning system that attempts to derive the most compact characterization of the data, it learns a set of signatures, structures which contain a set of stems and the suffixes they take. For example, it identifies 604 stems (e.g., alarm) that take the set of suffixes {-∅, -ed, -ing, -s}.…”

Section: The Distributional Learning Of Morphologymentioning

confidence: 99%

Morphology and Language Acquisition

Lignos¹,

Yang²

2016

The Cambridge Handbook of Morphology

View full text Add to dashboard Cite

“…Linguistica algorithm (Goldsmith, 2006) and Morfessor (Creutz and Lagus, 2004) represent word forms using sets of substrings and they utilise criteria such as the minimum description length (MDL). The end results of such processes are sets of strings which are similar to linguistic morphs (but not always the same).…”

Section: Past Workmentioning

confidence: 99%

An informal discovery procedure for two-level rules

Koskenniemi

2013

JLM

View full text Add to dashboard Cite

The paper shows how a certain kind of underlying representations (or deep forms) of words can be constructed in a straightforward manner through aligning the surface forms of the morphs of the word forms. The inventory of morphophonemes follows directly from this alignment. Furthermore, the two-level rules which govern the different realisations of such morphophonemes follow fairly directly from the previous steps. The alignment and rules are based upon an approximate general metric among phonemes, e.g., articulatory features, that determines which alternations are likely or possible. This enables us to summarise contexts for the different realisations.

show abstract

An algorithm for the unsupervised learning of morphology

Cited by 98 publications

References 14 publications

Computational learning of construction grammars

Computational learning of construction grammars

Morphology and Language Acquisition

An informal discovery procedure for two-level rules

Contact Info

Product

Resources

About