Structured Word Embedding for Low Memory Neural Network Language Model

Shi, Kaiyu; Yu, Kai

doi:10.21437/interspeech.2018-1057

Cited by 15 publications

(18 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, running these models on edge-devices, faces memory and latency issues due to limitations of the hardware. Thus, there has been considerable interest towards research in reducing the memory footprint and faster inference speed for these models (Sainath et al, 2013;Acharya et al, 2019;Shi and Yu, 2018;Jegou et al, 2010;Chen et al, 2018;Winata et al, 2019).…”

Section: Introductionmentioning

confidence: 99%

“…Recently, there has been considerable work on compressing word-embedding matrices (Sainath et al, 2013;Acharya et al, 2019;Shi and Yu, 2018;Jegou et al, 2010;Chen et al, 2018;Winata et al, 2019). These techniques have proven to perform atpar with the uncompressed models, but still suffer from a number of issues.…”

Section: Introductionmentioning

confidence: 99%

“…First, state-of-the-art embedding compression methods such as GroupReduce, Structured Emebedding and Tensor Train Decomposition (Shi and Yu, 2018;Chen et al, 2018;Khrulkov et al, 2019;Shu and Nakayama, 2018), require multiple hyper-parameters to be fine-tuned to optimize performance on each dataset. These hyper-parameters influence the number of parameters in the model, and thus the compression rate.…”

Section: Introductionmentioning

confidence: 99%

“…Additionally, Chen et al (2018) requires an additional optimization step for grouping words, and lacks end-to-end training through back-propagation. Shi and Yu (2018) also requires an additional step for performing k-means clustering for generating the quantization matrix. Thus, most of the current state-of-theart systems are much more complicated to fine-tune for different NLP problems and data-sets.…”

Section: Introductionmentioning

confidence: 99%

“…Lastly, embedding compression models not based on linear SVD (Khrulkov et al, 2019;Shi and Yu, 2018) require the reconstruction of the entire embedding matrix or additional computations, when used at the output-layer. Thus during runtime, the model either uses the same amount of memory as the uncompressed model or pays a higher computation cost.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Improving Word Embedding Factorization for Compression Using Distilled Nonlinear Neural Decomposition

Lioutas

Ahmad

Kumar

et al. 2020

Findings of the Association for Computational Linguistics: EMNLP 2020

View full text Add to dashboard Cite

Word-embeddings are vital components of Natural Language Processing (NLP) models and have been extensively explored. However, they consume a lot of memory which poses a challenge for edge deployment. Embedding matrices, typically, contain most of the parameters for language models and about a third for machine translation systems. In this paper, we propose Distilled Embedding, an (input/output) embedding compression method based on low-rank matrix decomposition and knowledge distillation. First, we initialize the weights of our decomposed matrices by learning to reconstruct the full pre-trained wordembedding and then fine-tune end-to-end, employing knowledge distillation on the factorized embedding. We conduct extensive experiments with various compression rates on machine translation and language modeling, using different data-sets with a shared wordembedding matrix for both embedding and vocabulary projection matrices. We show that the proposed technique is simple to replicate, with one fixed parameter controlling compression size, has higher BLEU score on translation and lower perplexity on language modeling compared to complex, difficult to tune state-of-theart methods.

show abstract

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations