Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing 2017
DOI: 10.18653/v1/d17-1309
|View full text |Cite
|
Sign up to set email alerts
|

Natural Language Processing with Small Feed-Forward Networks

Abstract: We show that small and shallow feedforward neural networks can achieve near state-of-the-art results on a range of unstructured and structured language processing tasks while being considerably cheaper in memory and computational requirements than deep recurrent models. Motivated by resource-constrained environments like mobile phones, we showcase simple techniques for obtaining such small neural network models, and investigate different tradeoffs when deciding how to allocate a small memory budget.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
14
0

Year Published

2018
2018
2021
2021

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 21 publications
(14 citation statements)
references
References 23 publications
0
14
0
Order By: Relevance
“…We compared our model with the state-of-theart preordering method proposed in (Nakagawa, 2015), which is hereafter referred to as BTG. We used its publicly available implementation, 7 and trained it on the same 100k sentences as our model.…”
Section: Settingsmentioning
confidence: 99%
“…We compared our model with the state-of-theart preordering method proposed in (Nakagawa, 2015), which is hereafter referred to as BTG. We used its publicly available implementation, 7 and trained it on the same 100k sentences as our model.…”
Section: Settingsmentioning
confidence: 99%
“…Our work is also related to prior work using hard word clustering for NLP tasks (Botha et al, 2017;Brown et al, 1992). The primary difference is that we cluster words to minimize the task loss rather than doing so beforehand.…”
Section: Related Workmentioning
confidence: 89%
“…Several methods have been proposed for reducing the memory requirements of models that use word embeddings. One is based on quantization (Botha et al, 2017;Han et al, 2016), which changes the way parameters are stored. In particular, it seeks to find shared weights among embedding vectors and only keeps scale factors for each word.…”
Section: Related Workmentioning
confidence: 99%
“…The use of multi-view data has resulted in considerable success in various NLP problems. Combining different word representations at the character, token, or sub-word levels has proven to be helpful for dependency parsing (Botha et al 2017;Andor et al 2016), Part-of-Speech (POS) tagging (Plank, Søgaard, and Goldberg 2016), and other NLP tasks.…”
Section: Introductionmentioning
confidence: 99%