Universal dependencies (UD) is a framework for morphosyntactic annotation of human language, which to date has been used to create treebanks for more than 100 languages. In this article, we outline the linguistic theory of the UD framework, which draws on a long tradition of typologically oriented grammatical theories. Grammatical relations between words are centrally used to explain how predicate–argument structures are encoded morphosyntactically in different languages while morphological features and part-of-speech classes give the properties of words. We argue that this theory is a good basis for cross-linguistically consistent annotation of typologically diverse languages in a way that supports computational natural language understanding as well as broader linguistic studies.
Abstract. We present an approach to building a learner corpus of Czech, manually corrected and annotated with error tags using a complex grammar-based taxonomy of errors in spelling, morphology, morphosyntax, lexicon and style. This grammar-based annotation is supplemented by a formal classification of errors based on surface alternations. To supply additional information about non-standard or ill-formed expressions, we aim at a synergy of manual and automatic annotation, deriving information from the original input and from the manual annotation.
Abstract-Information can change rapidly on the web. For example, news may hint some new story starts to develop. Many more news related to the original event begin to pour in the web. Imagine a person interested in how the story develops. It may be very difficult to trace it by trying to find the most relevant pages with most recent news on it. Our goal is to support user who wants to keep track of a developing story. We propose an approach and a system based on a bee hive model. The problem we focus on in this paper is that it is not possible to download all the pages using e.g. the breadth-first algorithm, nor to constantly revisit all the pages to see if new information were added. We propose to use a focused crawler to download the pages. With a prototype of our system, we performed a case study that shows that the system is able to collect relevant pages, it can monitor the story being developed during the search and it can even reconstruct the story backwards in time.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.