Proceedings of the Fourteenth EuroSys Conference 2019 2019
DOI: 10.1145/3302424.3303949
|View full text |Cite
|
Sign up to set email alerts
|

GRNN

Abstract: Recurrent neural networks (RNNs) have gained significant attention due to their effectiveness in modeling sequential data, such as text and voice signal. However, due to the complex data dependencies and limited parallelism, current inference libraries for RNNs on GPUs produce either high latency or poor scalability, leading to inefficient resource utilization. Consequently, companies like Microsoft and Facebook use CPUs to serve RNN models.This work demonstrates the root causes of the unsatisfactory performan… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
12
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3

Relationship

1
6

Authors

Journals

citations
Cited by 36 publications
(12 citation statements)
references
References 19 publications
0
12
0
Order By: Relevance
“…We compare SHARP against the state-of-the-art GPU, FPGA and ASIC implementations, i.e. cuDNN [20], GRNN [23],…”
Section: Resultsmentioning
confidence: 99%
See 3 more Smart Citations
“…We compare SHARP against the state-of-the-art GPU, FPGA and ASIC implementations, i.e. cuDNN [20], GRNN [23],…”
Section: Resultsmentioning
confidence: 99%
“…To meet the requirements of real-time inference at large scale, a high-performance and energy efficient accelerator for RNN is highly desired. However, two reasons make it very difficult to accomplish efficient RNN computation by CPUs or GPUs in parallel [21,22]: (1) recurrent behaviour of RNN architecture which imposes several data-dependencies, (2) limited parallel tasks due to the enforced low batch size by Service-Level Agreements (SLAs) in the inference evaluation [23,24]. FLOP Efficiency (%)…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations
“…While CCA provides strong data security and enables confidential computing on next-generation Arm devices, the support for GPUs [4], [21], [69], which are widely used to accelerate the general-, high-performance, and artificial intelligence computing scenarios [15], [18], [34], [45], [55], is only recently proposed. However, such support, called RME Device Assignment (RME-DA) [25], is currently a high-level concept without completed hardware implementation.…”
Section: Introductionmentioning
confidence: 99%