This study reports the results of using minimum description length (MDL) analysis to model unsupervised learning of the morphological segmentation of European languages, using corpora ranging in size from 5,000 words to 500,000 words. We develop a set of heuristics that rapidly develop a probabilistic morphological grammar, and use MDL as our primary tool to determine whether the modifications proposed by the heuristics will be adopted or not. The resulting grammar matches well the analysis that would be developed by a human morphologist. In the final section, we discuss the relationship of this style of MDL grammatical analysis to the notion of evaluation metric in early generative grammar.
This is a book whose time has come. And gone. Its publication roughly coincides with the twentieth anniversary of the appearance of Jeffrey Gruber's Lexical structures in syntax and semantics (1976 [1965]), which got localism off the ground. Though it has proved difficult to turn localism into a fortified theory, John Anderson (1971 and elsewhere), Ray Jackendoff (1978 and elsewhere), and a handful of others, have made a go at developing this approach into a more general statement about the way abstract ideas are encoded in language through the metaphorical use of lexical and grammatical resources whose basic function is to express spatial relationships. It takes a very stiff-spined attitude to find nothing attractive about localism. On the one hand, there is indubitably an easily expressed theory at its core, a theory that can be to some significant extent formalized, as Jackendoff and Anderson have tried to show, to the apathy of many. On the other hand, localists and localism encourage an attention to certain aspects of language that most other theories of grammar in the English-speaking world positively ignore, like the nature of the system that allows speakers to express a vast range of conceptual relationships within the allowed patterns spanned by a limited set of grammatical structures. If the study of language is ever to offer insights into the nature of mind, then that issue is the big one, and localism has traditionally not shied away from taking it on directly, quite unlike Chomskyan transformational grammar. Still, it has been the feeling of many, I suspect, that localism has not quite accomplished as much as its proponents have promised it would, and the appearance of this book offering analyses of a good part of the English prepositional system, of the verb get, as well as some observations on such diverse languages as Tonga (Bantu), French, German, Maori, Greek and Mandarin, might be expected to set this matter straight. But, alas, the reader will be disappointed. This book offers no such thing.
This paper describes in detail an algorithm for the unsupervised learning of natural language morphology, with emphasis on challenges that are encountered in languages typologically similar to European languages. It utilizes the Minimum Description Length analysis described in Goldsmith (2001), and has been implemented in software that is available for downloading and testing.2 There is no natural home in the analysis presented in this paper for the distinction between inflectional and derivational morphology. This question is addressed, however, in Goldsmith and Hu (2005), in which an analysis of the distinction is offered in terms of the geometry of a finite state automaton for the morphology.
The discussion of vowel harmony in this paper continues the theoretical discussion that was sparked by Clements' first proposals concerning an autosegmental treatment of vowel harmony in general (1980 [1976]). I will attempt to show that problems that arose in early autosegmental treatments of certain types of vowel harmony can be elegantly overcome and that autosegmental theory more generally provides an attractive framework for the treatment of vowel systems and vowel harmony. I will discuss three distinct types of systems here: the slightly asymmetrical system of Khalkha Mongolian, the canonical five-vowel system as it can be seen in Bantu (Yaka, in this case), and the well-known Finnish/Hungarian type of system. The kinds of advances made here answer, I believe, the critical comments made in Anderson (1980), in which significant sceptical questions are raised concerning whether the successes of autosegmental accounts of West African systems can be extended to other types of vowel harmony systems.
This article describes in detail several explicit computational methods for approaching such questions in phonology as the vowel/consonant distinction, the nature of vowel harmony systems, and syllable structure, appealing solely to distributional information. Beginning with the vowel/ consonant distinction, we consider a method for its discovery by the Russian linguist Boris Sukhotin, and compare it to two newer methods of more general interest, both computational and theoretical, today. The first is based on spectral decomposition of matrices, allowing for dimensionality reduction in a finely controlled way, and the second is based on finding parameters for maximum likelihood in a hidden Markov model. While all three methods work for discovering the fairly robust vowel/consonant distinction, we extend the newer ones to the discovery of vowel harmony, and in the case of the probabilistic model, to the discovery of some aspects of syllable structure.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.