Natural Language Processing with Small Feed-Forward Networks

Botha, Jan A.; Pitler, Emily; Ma, Ji; Bakalov, Anton; Salcianu, Alex; Weiss, David J.; McDonald, Ryan; Petrov, Slav

doi:10.18653/v1/d17-1309

Cited by 21 publications

(14 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We compared our model with the state-of-theart preordering method proposed in (Nakagawa, 2015), which is hereafter referred to as BTG. We used its publicly available implementation, 7 and trained it on the same 100k sentences as our model.…”

Section: Settingsmentioning

confidence: 99%

Recursive Neural Network Based Preordering for English-to-Japanese Machine Translation

Kawara

Chu

Arase

2018

Proceedings of ACL 2018, Student Research Workshop

View full text Add to dashboard Cite

The word order between source and target languages significantly influences the translation quality in machine translation. Preordering can effectively address this problem. Previous preordering methods require a manual feature design, making language dependent design costly. In this paper, we propose a preordering method with a recursive neural network that learns features from raw inputs. Experiments show that the proposed method achieves comparable gain in translation quality to the state-of-the-art method but without a manual feature design. 1 In this paper, we used binary syntax trees.

show abstract

Section: Settingsmentioning

confidence: 99%

Recursive Neural Network Based Preordering for English-to-Japanese Machine Translation

Kawara

Chu

Arase

2018

Proceedings of ACL 2018, Student Research Workshop

View full text Add to dashboard Cite

show abstract

“…Our work is also related to prior work using hard word clustering for NLP tasks (Botha et al, 2017;Brown et al, 1992). The primary difference is that we cluster words to minimize the task loss rather than doing so beforehand.…”

Section: Related Workmentioning

confidence: 89%

“…Several methods have been proposed for reducing the memory requirements of models that use word embeddings. One is based on quantization (Botha et al, 2017;Han et al, 2016), which changes the way parameters are stored. In particular, it seeks to find shared weights among embedding vectors and only keeps scale factors for each word.…”

Section: Related Workmentioning

confidence: 99%

Smaller Text Classifiers with Discriminative Cluster Embeddings

Chen

Gimpel

2018

Proceedings of the 2018 Conference of the North American Chapter Of the Association for Computational Linguistics: Hu

View full text Add to dashboard Cite

Word embedding parameters often dominate overall model sizes in neural methods for natural language processing. We reduce deployed model sizes of text classifiers by learning a hard word clustering in an end-to-end manner. We use the Gumbel-Softmax distribution to maximize over the latent clustering while minimizing the task loss. We propose variations that selectively assign additional parameters to words, which further improves accuracy while still remaining parameterefficient.

show abstract

“…The use of multi-view data has resulted in considerable success in various NLP problems. Combining different word representations at the character, token, or sub-word levels has proven to be helpful for dependency parsing (Botha et al 2017;Andor et al 2016), Part-of-Speech (POS) tagging (Plank, Søgaard, and Goldberg 2016), and other NLP tasks.…”

Section: Introductionmentioning

confidence: 99%

Semi-Supervised Learning on Meta Structure: Multi-Task Tagging and Parsing in Low-Resource Scenarios

Lim¹,

Lee²,

Carbonell³

et al. 2020

AAAI

View full text Add to dashboard Cite

Multi-view learning makes use of diverse models arising from multiple sources of input or different feature subsets for the same task. For example, a given natural language processing task can combine evidence from models arising from character, morpheme, lexical, or phrasal views. The most common strategy with multi-view learning, especially popular in the neural network community, is to unify multiple representations into one unified vector through concatenation, averaging, or pooling, and then build a single-view model on top of the unified representation. As an alternative, we examine whether building one model per view and then unifying the different models can lead to improvements, especially in low-resource scenarios. More specifically, taking inspiration from co-training methods, we propose a semi-supervised learning approach based on multi-view models through consensus promotion, and investigate whether this improves overall performance. To test the multi-view hypothesis, we use moderately low-resource scenarios for nine languages and test the performance of the joint model for part-of-speech tagging and dependency parsing. The proposed model shows significant improvements across the test cases, with average gains of -0.9 ∼ +9.3 labeled attachment score (LAS) points. We also investigate the effect of unlabeled data on the proposed model by varying the amount of training data and by using different domains of unlabeled data.

show abstract

Natural Language Processing with Small Feed-Forward Networks

Cited by 21 publications

References 23 publications

Recursive Neural Network Based Preordering for English-to-Japanese Machine Translation

Recursive Neural Network Based Preordering for English-to-Japanese Machine Translation

Smaller Text Classifiers with Discriminative Cluster Embeddings

Semi-Supervised Learning on Meta Structure: Multi-Task Tagging and Parsing in Low-Resource Scenarios

Contact Info

Product

Resources

About