Wordnets, which are repositories of lexical semantic knowledge containing semantically linked synsets and lexically linked words, are indispensable for work on computational linguistics and natural language processing. While building wordnets for Hindi and Marathi, two major Indo-European languages, we observed that the verb hierarchy in the Princeton Wordnet was rather shallow. We set to constructing a verb knowledge base for Hindi, which arranges the Hindi verbs in a hierarchy of is-a (hypernymy) relation. We realized that there are unique Indian language phenomena that bear upon the lexicalization vs. syntactically derived choice. One such example is the occurrence of conjunct and compound verbs (called Complex Predicates) which are found in all Indian languages. This paper presents our experience in the construction of lexical knowledge bases for Indian languages with special attention to Hindi. The question of storing versus deriving complex predicates has been dealt with linguistically and computationally. We have constructed empirical tests to decide if a combination of two words, the second of which is a verb, is a complex predicate or not. Such tests provide a principled way of deciding the status of complex predicates in Indian language wordnets.
In this article, we present an analysis of the complexity of grammatical constraints and their impact on early language acquisition of inflectional morphemes in Malayalam. We use the natural speech production data of two monolingual children acquiring Malayalam between the ages 1;9–2;10 and 2;3–3;0 and three bilingual children acquiring Malayalam-English between the ages 1;9–2;8, 2;0–3;0 and 1;10–2;11 to recover the underlying grammatical constraints that govern the correct productions as well as errors across monolingual and bilingual contexts. We find rules that reference lexico-semantic properties to be particularly challenging to young children.
This paper primarily presents an analysis of nominal inflection in Hindi within the framework of Distributed Morphology (Halle & Marantz 1993, 1994 and Harley and Noyer 1999). Müller (2002, 2003, 2004) for German, Icelandic and Russian nouns respectively and Weisser (2006) for Croatian nouns have also used Distributed Morphology (henceforth DM) to analyze nominal inflectional morphology. This paper will discuss in detail the inflectional categories and inflectional classes, the morphological processes operating at syntax, the distribution of vocabulary items and the readjustment rules required to describe Hindi nominal inflection. Earlier studies on Hindi inflectional morphology (Guru 1920, Vajpeyi 1958, Upreti 1964, etc.) were greatly influenced by the Paninian tradition (classical Sanskrit model) and work with Paninian constructs such as root and stem. They only provide descriptive studies of Hindi nouns and verbs and their inflections without discussing the role or status of affixes that take part in inflection. The discussion on the mechanisms (morphological operations and rules) used to analyze or generate word forms are missing in these studies. In addition, these studies do not account for syntax-morphology or morphology-phonology mismatches that show up in word formation. One aim of this paper is to present an economical way of forming noun classes in Hindi as compared to other traditional methods, especially gender and stem ending based or paradigm based methods that give rise to a large number of inflectional paradigms. Using inflectional class information to analyse the various forms of Hindi nouns, we can reduce the number of affixes and word-generation and readjustment rules that are required to describe nominal inflection. The analysis also helps us in developing a morphological analyzer for Hindi. The small set of rules and fewer inflectional classes are of great help to lexicographers and system developers. To the best of our knowledge, the analysis of Hindi inflectional morphology based on DM and its implementation in a Hindi morphological analyzer has not been done before. The methods discussed here can be applied to other Indian languages for analysis as well as word generation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.