2020
DOI: 10.1109/tc.2020.2984496
|View full text |Cite
|
Sign up to set email alerts
|

MViD: Sparse Matrix-Vector Multiplication in Mobile DRAM for Accelerating Recurrent Neural Networks

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
22
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
2
1

Relationship

2
6

Authors

Journals

citations
Cited by 22 publications
(22 citation statements)
references
References 33 publications
0
22
0
Order By: Relevance
“…There are also designs placing accelerators at bank (group) level to further exploit the inherent parallelism in DRAM devices [10,16,[26][27][28]31]. However, these designs are mostly used for elementwise or multiply-and-accumulate operations because they require all input operands to sit within a specific bank (group).…”
Section: Related Workmentioning
confidence: 99%
“…There are also designs placing accelerators at bank (group) level to further exploit the inherent parallelism in DRAM devices [10,16,[26][27][28]31]. However, these designs are mostly used for elementwise or multiply-and-accumulate operations because they require all input operands to sit within a specific bank (group).…”
Section: Related Workmentioning
confidence: 99%
“…In addition, as TPU v4 [28] reuses hardware designs of TPU v3 except for several components such as on-chip memory capacity, on-chip interconnect, and DMA, the VU of TPU v4 is the same structure as that of TPU v3. There have been processing-near-DRAM studies [10,14,31] to provide high of-chip memory bandwidth during inference. Because [10,14] use datalow architecture such as Eyeriss v1 [7] and systolic array, they still do not process DW-CONV eiciently.…”
Section: Related Workmentioning
confidence: 99%
“…Because [10,14] use datalow architecture such as Eyeriss v1 [7] and systolic array, they still do not process DW-CONV eiciently. In contrast, [31] has advantages for memory-intensive operations but has weaknesses for compute-intensive ST-CONV operations. Prior works supporting both ST-and DW-CONV: Previous architectural solutions have been mainly proposed to process both ST-and DW-CONV in an MU.…”
Section: Related Workmentioning
confidence: 99%
“…Contrary to many PIM work that perform MAC operation within DRAM [34], [53], [65], we leave the MAC operations entirely to the host NPU. Instead, GradPIM performs the parameter update phase, following the observations from Section II.…”
Section: Gradpimmentioning
confidence: 99%
“…To consider the power budget, we first estimated the maximum power of a DRAM channel as done by [53], by performing sequential reads while keeping the tFAW and tRRD constraints. Then we have scaled tFAW and tRRD so that performing consecutive PIM operations would yield the same maximum power.…”
Section: Timing Considerationsmentioning
confidence: 99%