Proceedings of the 2018 Conference of the North American Chapter Of the Association for Computational Linguistics: Hu 2018
DOI: 10.18653/v1/n18-1128
|View full text |Cite
|
Sign up to set email alerts
|

Reusing Weights in Subword-Aware Neural Language Models

Abstract: We propose several ways of reusing subword embeddings and other weights in subwordaware neural language models. The proposed techniques do not benefit a competitive character-aware model, but some of them improve the performance of syllable-and morpheme-aware models while showing significant reductions in model sizes. We discover a simple hands-on principle: in a multilayer input embedding model, layers should be tied consecutively bottom-up if reused at output. Our best morpheme-aware model with properly reus… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
17
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
3
2

Relationship

1
4

Authors

Journals

citations
Cited by 7 publications
(17 citation statements)
references
References 21 publications
0
17
0
Order By: Relevance
“…In addition to the above datasets, we also set up 6 common language modeling datasets: English Penn Treebank (PTB) (Marcus et al, 1993) and 5 non-English datasets with rich morphology from the 2013 ACL Workshop on Machine Translation 5 , which have been commonly used for evaluating character-aware NLMs (Botha and Blunsom, 2014;Kim et al, 2016;Bojanowski et al, 2017;Assylbekov and Takhanov, 2018). Since some of previous work has tested their model on PTB, we also included PTB in our experiment.…”
Section: Experiments On Common Datasetsmentioning
confidence: 99%
See 4 more Smart Citations
“…In addition to the above datasets, we also set up 6 common language modeling datasets: English Penn Treebank (PTB) (Marcus et al, 1993) and 5 non-English datasets with rich morphology from the 2013 ACL Workshop on Machine Translation 5 , which have been commonly used for evaluating character-aware NLMs (Botha and Blunsom, 2014;Kim et al, 2016;Bojanowski et al, 2017;Assylbekov and Takhanov, 2018). Since some of previous work has tested their model on PTB, we also included PTB in our experiment.…”
Section: Experiments On Common Datasetsmentioning
confidence: 99%
“…Bojanowski et al (2017) trained the word embeddings through skip-gram models with subword-level information, and used these word embeddings to initialize the lookup table of word embeddings of a word-level language model. Assylbekov and Takhanov (2018) focused on reusing embeddings and weights in a characteraware language model. The input of their model is also the sum of the morpheme embeddings of the word.…”
Section: Experiments On Common Datasetsmentioning
confidence: 99%
See 3 more Smart Citations