Part-of-speech Taggers for Low-resource Languages using CCA Features

Kim, Young–Bum; Snyder, Benjamin; Sarikaya, Ruhi

doi:10.18653/v1/d15-1150

Cited by 12 publications

(9 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Instead of projecting tag information via word alignment, the transfer in our model is driven by mapping multilingual embedding spaces. Kim et al (2015) also use latent word representations for multilingual transfer. However, similarly to prior work, this representation is learned using parallel data.…”

Section: Related Workmentioning

confidence: 99%

“…There is an expansive body of research on learning multilingual word embeddings (Gouws et al, 2014;Faruqui and Dyer, 2014;Lu et al, 2015;Lauly et al, 2014;Luong et al, 2015). Previous work has shown its effectiveness across a wide range of multilingual transfer tasks including tagging (Kim et al, 2015), syntactic parsing (Xiao and Guo, 2014;Guo et al, 2015;Durrett et al, 2012), and machine translation (Zou et al, 2013;Mikolov et al, 2013b). However, these approaches commonly require parallel sentences or bilingual lexicon to learn multilingual embeddings.…”

Section: Multilingual Word Embeddingsmentioning

confidence: 99%

See 1 more Smart Citation

Ten Pairs to Tag – Multilingual POS Tagging via Coarse Mapping between Embeddings

Zhang

Gaddy

Barzilay

et al. 2016

Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

101

View full text Add to dashboard Cite

In the absence of annotations in the target language, multilingual models typically draw on extensive parallel resources. In this paper, we demonstrate that accurate multilingual partof-speech (POS) tagging can be done with just a few (e.g., ten) word translation pairs. We use the translation pairs to establish a coarse linear isometric (orthonormal) mapping between monolingual embeddings. This enables the supervised source model expressed in terms of embeddings to be used directly on the target language. We further refine the model in an unsupervised manner by initializing and regularizing it to be close to the direct transfer model. Averaged across six languages, our model yields a 37.5% absolute improvement over the monolingual prototypedriven method (Haghighi and Klein, 2006) when using a comparable amount of supervision. Moreover, to highlight key linguistic characteristics of the generated tags, we use them to predict typological properties of languages, obtaining a 50% error reduction relative to the prototype model.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Multilingual Word Embeddingsmentioning

confidence: 99%

Ten Pairs to Tag – Multilingual POS Tagging via Coarse Mapping between Embeddings

Zhang

Gaddy

Barzilay

et al. 2016

Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

101

View full text Add to dashboard Cite

show abstract

“…Turning to statistical and machine learning methods for POS tagging, these methods can be listed as vari-ous Hidden Markov model-based methods [9,20,73], maximum entropy-based methods [12,56,74,75,77], perceptron algorithm-based approaches [13,66,71], neural network-based approaches [11,14,33,38,59,60,80], Conditional Random Fields [34,35,37,43,44], Support Vector Machines [25,31,63,69] and other approaches including decision trees [61,62] and hybrid methods [19,36]. Overview about the POS tagging task can be found in [26,28].…”

Section: Related Workmentioning

confidence: 99%

A robust transformation-based learning approach using ripple down rules for part-of-speech tagging

Nguyen

Pham

et al. 2016

AIC

View full text Add to dashboard Cite

In this paper, we propose a new approach to construct a system of transformation rules for the Part-of-Speech (POS) tagging task. Our approach is based on an incremental knowledge acquisition method where rules are stored in an exception structure and new rules are only added to correct the errors of existing rules; thus allowing systematic control of the interaction between the rules. Experimental results on 13 languages show that our approach is fast in terms of training time and tagging speed. Furthermore, our approach obtains very competitive accuracy in comparison to state-ofthe-art POS and morphological taggers.

show abstract

“…To alleviate the problem of word sparsity, we also use task-specific latent continuous word representations, induced on 65 million unlabeled tweets with 1.3 billion tokens. We create three sets of word representations: CCA (Dhillon et al, 2012;Kim et al, 2015a) based on matrix factorization, word2vec (Mikolov et al, 2013) and glove (Pennington et al, 2014), which are gradient based. All word representation algorithms produce 50dimensional word vectors for all words occurring at least 40 times in the data.…”

Section: Basic Featuresmentioning

confidence: 99%

“…An obvious solution to the problem is to develop methods of utilizing a large amount of unlabeled data. One way is to induce word embeddings in a real-valued vector space from a large number of tweets (Kim et al, 2015a;Mikolov et al, 2013;Pennington et al, 2014). It is shown that the task-specific embeddings induced on tweets provide more powerful than those created from out-ofdomain texts (Owoputi et al, 2012;Anastasakos et al, 2014).…”

Section: Introductionmentioning

confidence: 99%

Drop-out Conditional Random Fields for Twitter with Huge Mined Gazetteer

Yang¹,

Kim²,

Sarikaya³

et al. 2016

Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

Self Cite

View full text Add to dashboard Cite

In named entity recognition task especially for massive data like Twitter, having a large amount of high quality gazetteers can alleviate the problem of training data scarcity. One could collect large gazetteers from knowledge graph and phrase embeddings to obtain high coverage of gazetteers. However, large gazetteers cause a side-effect called "feature under-training", where the gazetteer features overwhelm the context features. To resolve this problem, we propose the dropout conditional random fields, which decrease the influence of gazetteer features with a high weight. Our experiments on named entity recognition with Twitter data lead to higher F1 score of 69.38%, about 4% better than the strong baseline presented in Smith and Osborne (2006).

show abstract

Part-of-speech Taggers for Low-resource Languages using CCA Features

Cited by 12 publications

References 19 publications

Ten Pairs to Tag – Multilingual POS Tagging via Coarse Mapping between Embeddings

Ten Pairs to Tag – Multilingual POS Tagging via Coarse Mapping between Embeddings

A robust transformation-based learning approach using ripple down rules for part-of-speech tagging

Drop-out Conditional Random Fields for Twitter with Huge Mined Gazetteer

Contact Info

Product

Resources

About