Universal dependencies (UD) is a framework for morphosyntactic annotation of human language, which to date has been used to create treebanks for more than 100 languages. In this article, we outline the linguistic theory of the UD framework, which draws on a long tradition of typologically oriented grammatical theories. Grammatical relations between words are centrally used to explain how predicate–argument structures are encoded morphosyntactically in different languages while morphological features and part-of-speech classes give the properties of words. We argue that this theory is a good basis for cross-linguistically consistent annotation of typologically diverse languages in a way that supports computational natural language understanding as well as broader linguistic studies.
This article gives an overview of the state of art of tools and resources for syntactic analysis of Estonian. A morphosyntactic disambiguator, surface-syntactic analyzer and dependency parser are all based on the Constraint Grammar formalism. As for language resources, a 400,000-word manually annotated dependency treebank has been created, its annotation scheme is compatible with the output of the Constraint Grammar dependency parser. Part of the treebank has been converted to the Universal Dependencies annotation scheme. Our tools have also been tested by large-scale corpus annotation.
Ülevaade. Artikkel käsitleb ühendverbide tuvastamist eesti keele automaatse pindsüntaktilise analüüsi käigus. Ühendverbide äratundmine on vajalik lause täpsemaks süntaktiliseks analüüsiks, sest lause osaliste süntaktilised funktsioonid, semantilised rollid ja nende keelendamine sõltub sellest, milline on lause keskmeks olev predikaatverb, sh sellest, kas predikaatverb on lihtverb või ühendverb. Ühendverbide tuvastamiseks rakendatakse kahte strateegiat: leksikonipõhist ja reeglipõhist. Viimane tähendab seda, et osa korrapäraseid produktiivselt kombineeruvaid ühendverbe pannakse kokku reeglite abil. Artiklis kirjeldatakse kahte eksperimenti: esialgse ja täiustatud ühendverbide tuvastamise käiku. Täiustatud süsteemi tulemus on päris hea, saavutades saagise 97,4% ja täpsuse 96,6%.* Võtmesõnad: arvutilingvistika, püsiühendite tuvastamine, eesti keel * Artikli valmimist on toetanud Euroopa Regionaalarengufond Eesti Arvutiteaduse Tippkeskuse kaudu ning Haridusja Teadusministeerium institutsionaalse uurimistoetuse IUT 20-56 "Eesti keele arvutimudelid" kaudu.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.