Proceedings of the 26th Asia and South Pacific Design Automation Conference 2021
DOI: 10.1145/3394885.3431659
|View full text |Cite
|
Sign up to set email alerts
|

A 0.57-GOPS/DSP Object Detection PIM Accelerator on FPGA

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
8
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 11 publications
(8 citation statements)
references
References 1 publication
0
8
0
Order By: Relevance
“…Better algorithms [24] [2] are designed to more efficiently use resources, improve model performance, and optimize the deployment in a real-world environment. Also, ultra-low-power AI chips [10] and accelerators [15] have been proposed to support always-on ML capability for an extended period by a battery. However, a joint design of hardware and algorithm [30] [13] is required to squeeze the performance since TinyML delivers ML solutions to constrained devices with limited resources.…”
Section: Related Workmentioning
confidence: 99%
“…Better algorithms [24] [2] are designed to more efficiently use resources, improve model performance, and optimize the deployment in a real-world environment. Also, ultra-low-power AI chips [10] and accelerators [15] have been proposed to support always-on ML capability for an extended period by a battery. However, a joint design of hardware and algorithm [30] [13] is required to squeeze the performance since TinyML delivers ML solutions to constrained devices with limited resources.…”
Section: Related Workmentioning
confidence: 99%
“…Much work has explored the lightweight of DNNs, such as network pruning [31], [32], knowledge distillation [33], [34], and quantization [35], [36]. DNNs run with low precision operations during inference provide power and memory advantages over full precision, and it also benefits low-bit-width artificial intelligence chip design [37], [38]. The main idea of quantization is to map full precision floating-point numbers to lower precision (8bit or lower) through a quantizer to significantly reduce the amount of floating-point operations (FLOPs) in matrix multiplication.…”
Section: Total Direct Effectmentioning
confidence: 99%
“…In PIM architecture, the quantized network is pre-loaded into BRAM, and intermediate data accessed from/to the off-chip DRAM access is entirely eliminated during inference. This paper implements a part of the CA-SpaceNet into the PIM accelerator proposed by Jiao et al [37], which is demonstrated in Fig. 5(b).…”
Section: E Network Quantizationmentioning
confidence: 99%
See 1 more Smart Citation
“…It is about the proliferation of hardware, progress on algorithms, emerging ecosystem, and transformative applications. Ultra-low-power devices have been designed for always-on applications [7] [11]. Various algorithms have been proposed to fully exploit ML models on the devices without compromising performance [22] [17].…”
Section: Related Workmentioning
confidence: 99%