This paper presents a methodology for identifying and resolving various kinds of inconsistency in the context of merging dependency and multiword expression (MWE) annotations, to generate a dependency treebank with comprehensive MWE annotations. Candidates for correction are identified using a variety of heuristics, including an entirely novel one which identifies violations of MWE constituency in the dependency tree, and resolved by arbitration with minimal human intervention. Using this technique, we identified and corrected several hundred errors across both parse and MWE annotations, representing changes to a significant percentage (well over 10%) of the MWE instances in the joint corpus.
This paper summarizes results of a theoretical analysis of syntactic behavior of Czech light verb constructions and their verification in the linguistic annotation of a large amount of these constructions. The concept of LVCs is based on the observation that nouns denoting actions, states, or properties have a strong tendency to select semantically underspecified verbs, which leads to a specific rearrangement of valency complementations of both nouns and verbs in the syntactic structure. On the basis of the description of deep and surface syntactic properties of LVCs, a formal model of their lexicographic representation is proposed here. In addition, the resulting data annotation, capturing almost 1,500 LVCs, is described in detail. This annotation has been integrated in a new version of the VALLEX lexicon, release 3.5.
This paper describes results of a study related to the PARSEME Shared Task on automatic detection of verbal Multi-Word Expressions (MWEs) which focuses on their identification in running texts in many languages. The Shared Task's organizers have provided basic annotation guidelines where four basic types of verbal MWEs are defined including some specific subtypes. Czech is among the twenty languages selected for the task. We will contribute to the Shared Task dataset, a multilingual open resource, by converting data from the Prague Dependency Treebank (PDT) to the Shared Task format. The question to answer is to which extent this can be done automatically. In this paper, we concentrate on one of the relevant MWE categories, namely on the quasi-universal category called "Inherently Pronominal Verbs" (IPronV) and describe its annotation in the Prague Dependency Treebank. After comparing it to the Shared Task guidelines, we can conclude that the PDT and the associated valency lexicon, PDT-Vallex, contain sufficient information for the conversion, even if some specific instances will have to be checked. As a side effect, we have identified certain errors in PDT annotation which can now be automatically corrected.
In this article, we introduce a project aimed at enhancing a valency lexicon of Czech verbs with semantic information. For this purpose, we make use of FrameNet, a semantically oriented lexical resource. At the present stage, semantic frames from FrameNet have been mapped to eight groups of verbs with various semantic and syntactic properties. The feasibility of this task has been verified by the achieved inter-annotator agreement measured on two semantically and syntactically different groups of verbs -verbs of communication and exchange (85.9% and 78.5%, respectively). Based on the upper level semantic frames from the relation of 'Inheritance' built in FrameNet, the verbs of these eight groups have been classified into more coherent semantic classes. Moreover, frame elements from these upper level semantic frames have been assigned to valency complementations of the verbs of the listed groups as semantic roles. As in case of semantic frames, the achieved interannotator agreement concerning assigning frame elements measured on verbs of communication and exchange has been promising (95.6% and 91.2%, respectively).As a result, 1 270 lexical units pertaining to the verbs of communication, mental action, psych verbs, social interaction, verbs of exchange, motion, transport and location (2 129 Czech verbs in total if perfective and imperfective verbs being counted separately) have been classified into syntactically and semantically coherent classes and their valency complementations have been characterized by semantic roles adopted from the FrameNet lexical database.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.