Semi-Supervised Neural System for Tagging, Parsing and Lematization

Rybak, Piotr; Wróblewska, Alina

doi:10.18653/v1/k18-2004

Cited by 19 publications

(21 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…The sentences are tokenised with UDPipe 10 (Straka and Straková, 2017) and POS-tagged and dependency parsed with COMBO 11 (Rybak and Wróblewska, 2018). The UDPipe and COMBO models are trained on the UD English-EWT treebank 12 (Silveira et al, 2014) with 16k trees (254k tokens) and on the Polish PDB-UD treebank 13 (Wróblewska, 2018) with 22k trees (351k tokens).…”

Section: Probing Datasetsmentioning

confidence: 99%

Empirical Linguistic Study of Sentence Embeddings

Krasnowska-Kieraś

Wróblewska

2019

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Self Cite

View full text Add to dashboard Cite

The purpose of the research is to answer the question whether linguistic information is retained in vector representations of sentences. We introduce a method of analysing the content of sentence embeddings based on universal probing tasks, along with the classification datasets for two contrasting languages. We perform a series of probing and downstream experiments with different types of sentence embeddings, followed by a thorough analysis of the experimental results. Aside from dependency parser-based embeddings, linguistic information is retained best in the recently proposed LASER sentence embeddings.

show abstract

Section: Probing Datasetsmentioning

confidence: 99%

Empirical Linguistic Study of Sentence Embeddings

Krasnowska-Kieraś

Wróblewska

2019

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Self Cite

View full text Add to dashboard Cite

show abstract

“…For discriminative models, DT is the direct transfer baseline and S-T is the self-training baseline, both of which use the biaffine parser (Dozat and Manning, 2017). S-T follows Rybak and Wróblewska (2018) who use the source model to predict parse trees on the target data and then perform supervised training of the target model. The last eight methods are our methods.…”

Section: Resultsmentioning

confidence: 99%

Unsupervised Cross-Lingual Adaptation of Dependency Parsers Using CRF Autoencoders

Zhao¹,

2020

Findings of the Association for Computational Linguistics: EMNLP 2020

View full text Add to dashboard Cite

We consider the task of cross-lingual adaptation of dependency parsers without annotated target corpora and parallel corpora. Previous work either directly applies a discriminative source parser to the target language, ignoring unannotated target corpora, or employs an unsupervised generative parser that can leverage unannotated target data but has weaker representational power than discriminative parsers. In this paper, we propose to utilize unsupervised discriminative parsers based on the CRF autoencoder framework for this task. We train a source parser and use it to initialize and regularize a target parser that is trained on unannotated target data. We conduct experiments that transfer an English parser to 20 target languages. The results show that our method significantly outperforms previous methods. 1

show abstract

“…While several authors failed to demonstrate the efficacy of self-training for dependency parsing (e.g. (Rush et al, 2012)), recently it was found useful for neural dependency parsing in fully supervised multilingual settings (Rybak and Wróblewska, 2018).…”

Section: Previous Workmentioning

confidence: 99%

“…For constituency parsing, selftraining has shown to improve linear parsers both when considerable training data are available (McClosky et al, 2006a,b), and in the lightly supervised and the cross-domain setups (Reichart and Rappoport, 2007). Although several authors failed to demonstrate the efficacy of self-training for dependency parsing (e.g., Rush et al, 2012), recently it was found useful for neural dependency parsing in fully supervised multilingual settings (Rybak and Wróblewska, 2018).…”

Section: Previous Workmentioning

confidence: 99%

Deep Contextualized Self-training for Low Resource Dependency Parsing

Rotman

Reichart

2019

Transactions of the Association for Computational Linguistics

View full text Add to dashboard Cite

Neural dependency parsing has proven very effective, achieving state-of-the-art results on numerous domains and languages. Unfortunately, it requires large amounts of labeled data, that is costly and laborious to create. In this paper we propose a selftraining algorithm that alleviates this annotation bottleneck by training a parser on its own output. Our Deep Contextualized Selftraining (DCST) algorithm utilizes representation models trained on sequence labeling tasks that are derived from the parser's output when applied to unlabeled data, and integrates these models with the base parser through a gating mechanism. We conduct experiments across multiple languages, both in low resource in-domain and in crossdomain setups, and demonstrate that DCST substantially outperforms traditional selftraining as well as recent semi-supervised training methods. 1 2

show abstract

Semi-Supervised Neural System for Tagging, Parsing and Lematization

Cited by 19 publications

References 8 publications

Empirical Linguistic Study of Sentence Embeddings

Empirical Linguistic Study of Sentence Embeddings

Unsupervised Cross-Lingual Adaptation of Dependency Parsers Using CRF Autoencoders

Deep Contextualized Self-training for Low Resource Dependency Parsing

Contact Info

Product

Resources

About