Distributed Word Representation Learning for Cross-Lingual Dependency Parsing

Xiao, Min; Guo, Yuhong

doi:10.3115/v1/w14-1613

Cited by 68 publications

(67 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Previous work has shown its effectiveness across a wide range of multilingual transfer tasks including tagging (Kim et al, 2015), syntactic parsing (Xiao and Guo, 2014;Guo et al, 2015;Durrett et al, 2012), and machine translation (Zou et al, 2013;Mikolov et al, 2013b). However, these approaches commonly require parallel sentences or bilingual lexicon to learn multilingual embeddings.…”

Section: Multilingual Word Embeddingsmentioning

confidence: 99%

Ten Pairs to Tag – Multilingual POS Tagging via Coarse Mapping between Embeddings

Zhang

Gaddy

Barzilay

et al. 2016

Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

101

View full text Add to dashboard Cite

In the absence of annotations in the target language, multilingual models typically draw on extensive parallel resources. In this paper, we demonstrate that accurate multilingual partof-speech (POS) tagging can be done with just a few (e.g., ten) word translation pairs. We use the translation pairs to establish a coarse linear isometric (orthonormal) mapping between monolingual embeddings. This enables the supervised source model expressed in terms of embeddings to be used directly on the target language. We further refine the model in an unsupervised manner by initializing and regularizing it to be close to the direct transfer model. Averaged across six languages, our model yields a 37.5% absolute improvement over the monolingual prototypedriven method (Haghighi and Klein, 2006) when using a comparable amount of supervision. Moreover, to highlight key linguistic characteristics of the generated tags, we use them to predict typological properties of languages, obtaining a 50% error reduction relative to the prototype model.

show abstract

Section: Multilingual Word Embeddingsmentioning

confidence: 99%

Ten Pairs to Tag – Multilingual POS Tagging via Coarse Mapping between Embeddings

Zhang

Gaddy

Barzilay

et al. 2016

Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

101

View full text Add to dashboard Cite

show abstract

“…In addition to having a direct application in inherently crosslingual tasks like machine translation (Zou et al, 2013) and crosslingual entity linking (Tsai and Roth, 2016), they provide an excellent mechanism for transfer learning, where a model trained in a resource-rich language is transferred to a less-resourced one, as shown with part-of-speech tagging , parsing (Xiao and Guo, 2014) and document classification (Klementiev et al, 2012).…”

Section: Introductionmentioning

confidence: 99%

Learning bilingual word embeddings with (almost) no bilingual data

Artetxe

Labaka

Agirre

2017

Proceedings of the 55th Annual Meeting of the Association For Computational Linguistics (Volume 1: Long Papers)

407

528

View full text Add to dashboard Cite

Most methods to learn bilingual word embeddings rely on large parallel corpora, which is difficult to obtain for most language pairs. This has motivated an active research line to relax this requirement, with methods that use document-aligned corpora or bilingual dictionaries of a few thousand words instead. In this work, we further reduce the need of bilingual resources using a very simple self-learning approach that can be combined with any dictionary-based mapping technique. Our method exploits the structural similarity of embedding spaces, and works with as little bilingual evidence as a 25 word dictionary or even an automatically generated list of numerals, obtaining results comparable to those of systems that use richer resources.

show abstract

“…Bilingual word representations could serve as an useful source knowledge for problems in cross-lingual information retrieval (Levow, Oard, & Resnik, 2005;Vulić, De Smet, & Moens, 2013), statistical machine translation (Wu, Wang, & Zong, 2008), document classification (Ni, Sun, Hu, & Chen, 2011;Klementiev et al, 2012;Hermann & Blunsom, 2014b;Chandar, Lauly, Larochelle, Khapra, Ravindran, Raykar, & Saha, 2014;Vulić, De Smet, Tang, & Moens, 2015), bilingual lexicon extraction (Tamura, Watanabe, & Sumita, 2012;Vulić & Moens, 2013a), or knowledge transfer and annotation projection from resource-rich to resource-poor languages for a myriad of NLP tasks such as dependency parsing, POS tagging, semantic role labeling or selectional preferences (Yarowsky & Ngai, 2001;Padó & Lapata, 2009;Peirsman & Padó, 2010;Das & Petrov, 2011;Täckström, Das, Petrov, McDonald, & Nivre, 2013;Ganchev & Das, 2013;Tiedemann, Agić, & Nivre, 2014;Xiao & Guo, 2014). Other interesting application domains are machine translation (e.g., Zou, Socher, Cer, & Manning, 2013;Wu, Dong, Hu, Yu, He, Wu, Wang, & Liu, 2014;Zhang, Liu, Li, Zhou, & Zong, 2014) and cross-lingual information retrieval (e.g., .…”

Section: Bilingual Word Embeddingsmentioning

confidence: 99%

“…We may cluster the current work in three different groups: (1) the models that rely on hard word alignments obtained from parallel data to constrain the learning of BWEs (Klementiev et al, 2012;Zou et al, 2013;Wu et al, 2014); (2) the models that use the alignment of parallel data at the sentence level (Kočiský, Hermann, & Blunsom, 2014;Hermann & Blunsom, 2014aChandar et al, 2014;Shi, Liu, Liu, & Sun, 2015;; (3) the models that critically require readily available bilingual lexicons (Mikolov et al, 2013b;Faruqui & Dyer, 2014;Xiao & Guo, 2014). The main disadvantage of all these models is the limited availability of parallel data and bilingual lexicons, resources which are scarce and/or domain-restricted for plenty of language pairs.…”

Section: Bilingual Word Embeddingsmentioning

confidence: 99%

Bilingual Distributed Word Representations from Document-Aligned Comparable Data

Vulić

Moens

2016

jair

View full text Add to dashboard Cite

We propose a new model for learning bilingual word representations from non-parallel document-aligned data. Following the recent advances in word representation learning, our model learns dense real-valued word vectors, that is, bilingual word embeddings (BWEs). Unlike prior work on inducing BWEs which heavily relied on parallel sentence-aligned corpora and/or readily available translation resources such as dictionaries, the article reveals that BWEs may be learned solely on the basis of document-aligned comparable data without any additional lexical resources nor syntactic information. We present a comparison of our approach with previous state-of-the-art models for learning bilingual word representations from comparable data that rely on the framework of multilingual probabilistic topic modeling (MuPTM), as well as with distributional local context-counting models. We demonstrate the utility of the induced BWEs in two semantic tasks: (1) bilingual lexicon extraction, (2) suggesting word translations in context for polysemous words. Our simple yet effective BWE-based models significantly outperform the MuPTM-based and contextcounting representation models from comparable data as well as prior BWE-based models, and acquire the best reported results on both tasks for all three tested language pairs.

show abstract

Distributed Word Representation Learning for Cross-Lingual Dependency Parsing

Cited by 68 publications

References 15 publications

Ten Pairs to Tag – Multilingual POS Tagging via Coarse Mapping between Embeddings

Ten Pairs to Tag – Multilingual POS Tagging via Coarse Mapping between Embeddings

Learning bilingual word embeddings with (almost) no bilingual data

Bilingual Distributed Word Representations from Document-Aligned Comparable Data

Contact Info

Product

Resources

About