Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer 2021
DOI: 10.18653/v1/2021.acl-long.23
|View full text |Cite
|
Sign up to set email alerts
|

Multi-Head Highly Parallelized LSTM Decoder for Neural Machine Translation

Abstract: One of the reasons Transformer translation models are popular is that self-attention networks for context modelling can be easily parallelized at sequence level. However, the computational complexity of a self-attention network is O(n 2 ), increasing quadratically with sequence length. By contrast, the complexity of LSTM-based approaches is only O(n). In practice, however, LSTMs are much slower to train than self-attention networks as they cannot be parallelized at sequence level: to model context, the current… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(3 citation statements)
references
References 18 publications
0
3
0
Order By: Relevance
“…Language Models. Language models are widely used in a variety of real-world applications, such as sentiment analysis [12], [50], [65], neural translation [75], [1], and questionanswering [15], [30]. Modern language models use Transformer [70] as their backbone and contain billions of parameters, e.g., the minimal version of Stanfold Alpaca [68] (open source alternative to OpenAI ChatGPT) contains 7 billion parameters.…”
Section: A Language Models and Prompt-tuningmentioning
confidence: 99%
“…Language Models. Language models are widely used in a variety of real-world applications, such as sentiment analysis [12], [50], [65], neural translation [75], [1], and questionanswering [15], [30]. Modern language models use Transformer [70] as their backbone and contain billions of parameters, e.g., the minimal version of Stanfold Alpaca [68] (open source alternative to OpenAI ChatGPT) contains 7 billion parameters.…”
Section: A Language Models and Prompt-tuningmentioning
confidence: 99%
“…For resource allocation, the presented ESMOML-RAA technique employed the HP-LSTM model. For enabling the LSTM to compute 0 t in parallel, the HPLSTM utilizes a bag-of-words representation s t of previous tokens for the computation of gates and HL [24]:…”
Section: Resource Allocation Using the Hp-lstm Modelmentioning
confidence: 99%
“…For resource allocation, the presented ESMOML-RAA technique employed the HP-LSTM model. For enabling the LSTM to compute in parallel, the HPLSTM utilizes a bag-of-words representation of previous tokens for the computation of gates and HL [ 24 ]: whereas refers to the zero vector. The BoW representation is attained effectually using the cumulative sum function.…”
Section: The Proposed Modelmentioning
confidence: 99%