2021 Symposium on VLSI Circuits 2021
DOI: 10.23919/vlsicircuits52068.2021.9492504
|View full text |Cite
|
Sign up to set email alerts
|

OmniDRL: A 29.3 TFLOPS/W Deep Reinforcement Learning Processor with Dualmode Weight Compression and On-chip Sparse Weight Transposer

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
21
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 14 publications
(21 citation statements)
references
References 1 publication
0
21
0
Order By: Relevance
“…For more evaluation, we compare LearningGroup against the state-ofthe-art DNN training accelerators optimized for processing sparse data, such as EagerPruning [21], Procrustes [29], SparseTrain [22], and OmniDRL [35]. Since each accelerator has a different number of processing units and operating frequency, we compare how well each accelerator exploits the sparsity by measuring the speedup achieved in sparse data over the dense case.…”
Section: Comparison With Sparse Training Acceleratorsmentioning
confidence: 99%
See 2 more Smart Citations
“…For more evaluation, we compare LearningGroup against the state-ofthe-art DNN training accelerators optimized for processing sparse data, such as EagerPruning [21], Procrustes [29], SparseTrain [22], and OmniDRL [35]. Since each accelerator has a different number of processing units and operating frequency, we compare how well each accelerator exploits the sparsity by measuring the speedup achieved in sparse data over the dense case.…”
Section: Comparison With Sparse Training Acceleratorsmentioning
confidence: 99%
“…Deep Reinforcement Learning Accelerator There have been an increasing number of works on accelerating deep reinforcement learning while they only focus on single-agent domain [35]- [38]. Kim et al [37] compress the input activations using the top three frequently used exponent values rather than pruning the weights.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Another paper [30] added a weight transpose-reading unit instead of modifying the main PE array. This additional unit generated the transposed weights before they were fetched to • Back-propagation: the PE array.…”
Section: B Architecture-level Solutionmentioning
confidence: 99%
“…Weight transposer suggested by [30] adopted hierarchical transpose-read and reduced memory access by excluding the fetch of pruned weights. Even though the weight transposer unit showed fast weight decoding speed, it can rather induce low area efficiency due to the additional large register file array.…”
Section: B Architecture-level Solutionmentioning
confidence: 99%