2022
DOI: 10.1145/3499757
|View full text |Cite
|
Sign up to set email alerts
|

E-BATCH: Energy-Efficient and High-Throughput RNN Batching

Abstract: Recurrent Neural Network (RNN) inference exhibits low hardware utilization due to the strict data dependencies across time-steps. Batching multiple requests can increase throughput. However, RNN batching requires a large amount of padding since the batched input sequences may vastly differ in length. Schemes that dynamically update the batch every few time-steps avoid padding. However, they require executing different RNN layers in a short time span, decreasing energy efficiency. Hence, we propose E-BATCH, a l… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
23
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(23 citation statements)
references
References 24 publications
0
23
0
Order By: Relevance
“…As depicted, each LSTM gate performs two matrix-vector multiplications (MVMs), which finally decide how to update the cell-state (c t ), and how to generate the hidden output vector (h t ) that is recurrently sent to the following time step. Two kinds of dependencies exist in these The figure shows E-PUR's [21] speedup running EESEN [8], for a range of MAC units. Due to the scalability issue, it does not achieve performance improvement proportional to the increase in resources.…”
Section: Rnn Backgroundmentioning
confidence: 99%
See 3 more Smart Citations
“…As depicted, each LSTM gate performs two matrix-vector multiplications (MVMs), which finally decide how to update the cell-state (c t ), and how to generate the hidden output vector (h t ) that is recurrently sent to the following time step. Two kinds of dependencies exist in these The figure shows E-PUR's [21] speedup running EESEN [8], for a range of MAC units. Due to the scalability issue, it does not achieve performance improvement proportional to the increase in resources.…”
Section: Rnn Backgroundmentioning
confidence: 99%
“…For instance, NPUs [24,28] have the parallel multiply-accumulation (MAC) stage as the heart of their pipeline and are not optimized in case the serial part becomes the performance bottleneck for some models. On the other hand, customized accelerators [21,29] use a relatively small resource budget which therefore causes large delay for MVM, hence overlap the remaining LSTM computation that needs to run sequentially. However, when using more MACs, the issue of efficiently handling LSTM's dependencies still remains.…”
Section: Challenges and Opportunitiesmentioning
confidence: 99%
See 2 more Smart Citations
“…Meanwhile, based on the learning level set of the mean simulator response, several new schemes were developed, including Multi-level Batching (MLB), Ratchet Batching (RB), Adaptive Batched Stepwise Uncertainty Reduction (ABSUR), Adaptive Design with the Stepwise Allocation (ADSA), and Deterministic Design with the Stepwise Allocation (DDSA). MLB, RB, and ABSUR simultaneously or ADSA and DDSA sequentially determine the sequential design inputs and the respective number of replicates [3]. By Bermudan option pricing via Monte Carlo regression, quantitative applications in many financial instances showed that the method has a significant computational speed and a low distortion rate.…”
Section: Introductionmentioning
confidence: 99%