Universal Dependencies (UD) are gaining much attention of late for systematic evaluation of cross-lingual techniques for crosslingual dependency parsing. In this paper we present our work in line with UD. Our contribution to this is manifold. We extend UD to Indian languages through conversion of Pān ̣inian Dependencies to UD for the Hindi Dependency Treebank (HDTB). We discuss the differences in annotation in both the schemes, present parsing experiments for both the formalisms and empirically evaluate their weaknesses and strengths for Hindi. We produce an automatically converted Hindi Treebank conforming to the international standard UD scheme, making it useful as a resource for multilingual language technology.
Large scale efforts are underway to create dependency treebanks and parsers for Hindi and other Indian languages. Hindi, being a morphologically rich, flexible word order language, brings challenges such as handling non-projectivity in parsing.In this work, we look at non-projectivity in Hyderabad Dependency Treebank (HyDT) for Hindi. Non-projectivity has been analysed from two perspectives: graph properties that restrict non-projectivity and linguistic phenomenon behind non-projectivity in HyDT. Since Hindi has ample instances of non-projectivity (14% of all structures in HyDT are non-projective), it presents a case for an in depth study of this phenomenon for a better insight, from both of these perspectives.We have looked at graph constriants like planarity, gap degree, edge degree and well-nestedness on structures in HyDT. We also analyse non-projectivity in Hindi in terms of various linguistic parameters such as the causes of non-projectivity, its rigidity (possibility of reordering) and whether the reordered construction is the natural one.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.