Universal dependencies (UD) is a framework for morphosyntactic annotation of human language, which to date has been used to create treebanks for more than 100 languages. In this article, we outline the linguistic theory of the UD framework, which draws on a long tradition of typologically oriented grammatical theories. Grammatical relations between words are centrally used to explain how predicate–argument structures are encoded morphosyntactically in different languages while morphological features and part-of-speech classes give the properties of words. We argue that this theory is a good basis for cross-linguistically consistent annotation of typologically diverse languages in a way that supports computational natural language understanding as well as broader linguistic studies.
The problem of Vietnamese syntactic parsing, especially constituency parsing, has recently been tackled by several research groups. A common effort of the Vietnamese language processing community has allowed the creation of VietTreebank, a reference parsed corpus containing about 10,000 sentences for the constituency parsing task. In this paper, we present our work to build a reference treebank, based on VietTreebank, for the dependency parsing task, which has not yet been very well studied for Vietnamese. First we define a dependency label set by adapting the dependency schema developed by the NLP group at Stanford university and taking into account the particularities of Vietnamese grammar. Then we propose an algorithm to convert a constituency treebank to a dependency one. The algorithm is tested on a set of 100 sentences of VietTreebank corpus and gives very good results. Finally, we carry out an experiment on Vietnamese dependency parsing using MaltParser tool and the dependency treebank converted from VietTreebank.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.