In this paper we describe the MUMULS system that participated to the 2017 shared task on automatic identification of verbal multiword expressions (VMWEs). The MUMULS system was implemented using a supervised approach based on recurrent neural networks using the open source library TensorFlow. The model was trained on a data set containing annotated VMWEs as well as morphological and syntactic information. The MU-MULS system performed the identification of VMWEs in 15 languages, it was one of few systems that could categorize VMWEs type in nearly all languages.
In this paper, we present the findings of the Third VarDial Evaluation Campaign organized as part of the sixth edition of the workshop on Natural Language Processing (NLP) for Similar Languages, Varieties and Dialects (VarDial), co-located with NAACL 2019. This year, the campaign included five shared tasks, including one task rerun -German Dialect Identification (GDI)-and four new tasks-Cross-lingual Morphological Analysis (CMA), Discriminating between Mainland and Taiwan variation of Mandarin Chinese (DMT), Moldavian vs. Romanian Cross-dialect Topic identification (MRC), and Cuneiform Language Identification (CLI). A total of 22 teams submitted runs across the five shared tasks. After the end of the competition, we received 14 system description papers, which are published in the VarDial workshop proceedings and referred to in this report.
The focus of this article is on the creation of a collection of sentences manually annotated with respect to their sentence structure. We show that the concept of linear segments-linguistically motivated units, which may be easily detected automatically-serves as a good basis for the identification of clauses in Czech. The segment annotation captures such relationships as subordination, coordination, apposition and parenthesis; based on segmentation charts, individual clauses forming a complex sentence are identified. The annotation of a sentence structure enriches a dependency-based framework with explicit syntactic information on relations among complex units like clauses. We have gathered a collection of 3,444 sentences from the Prague Dependency Treebank, which were annotated with respect to their sentence structure (these sentences comprise 10,746 segments forming 6,341 clauses). The main purpose of the project is to gain a development data-promising results for Czech NLP tools (as a dependency parser or a machine translation system for related languages) that adopt an idea of clause segmentation have been already reported. The collection of sentences with annotated sentence structure provides the possibility of further improvement of such tools.
The goal of the presented project is to assign a structure of clauses to Czech sentences from the Prague Dependency Treebank (PDT) as a new layer of syntactic annotation, a layer of clause structure. The annotation is based on the concept of segments, linguistically motivated and easily automatically detectable units. The task of the annotators is to identify relations among segments, especially relations of super/subordination, coordination, apposition and parenthesis. Then they identify individual clauses forming complex sentences.In the pilot phase of the annotation, 2,699 sentences from PDT were annotated with respect to their sentence structure.
Distributional Semantic Models (DSMs) established themselves as a standard for the representation of word and sentence meaning. However, DSMs provide quantitative measurement of how strongly two linguistic expressions are related, without being able to automatically classify different semantic relations. Hence the notion of semantic similarity is underspecified in DSMs. We introduce Evalution-MAN in this paper as an effort to address this underspecification problem. Following the EVALution 1.0 dataset for English, we present a dataset for evaluating DSMs on the task of the identification of semantic relations in Mandarin Chinese. Moreover, we test different types of word vectors on the automatic learning of these semantic relations, and we evaluate them both in a unsupervised and in a supervised setting, finding that distributional models tend, in general, to assign higher similarity scores to synonyms and that deep learning classifiers are the best performing ones in the identification of semantic relations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.