2022
DOI: 10.1145/3527457
|View full text |Cite
|
Sign up to set email alerts
|

Memory-Throughput Trade-off for CNN-Based Applications at the Edge

Abstract: Many modern applications require execution of Convolutional Neural Networks (CNNs) on edge devices, such as mobile phones or embedded platforms. This can be challenging as the state-of-the art CNNs are memory-costly, whereas the memory budget of edge devices is highly limited. To address this challenge, a variety of CNN memory reduction methodologies have been proposed. Typically, the memory of a CNN is reduced using methodologies such as pruning and quantization. These methodologies reduce the number or preci… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
5
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(5 citation statements)
references
References 21 publications
0
5
0
Order By: Relevance
“…Since we target a mobile application running the classification locally on the devices, model performance and resource requirements are important. This trade-off currently gains much intention in industrial applications, e.g., in [53]. The approaches to pruning or quantizing the models are state of the art as, e.g., described in [14,15].…”
Section: Related Workmentioning
confidence: 99%
“…Since we target a mobile application running the classification locally on the devices, model performance and resource requirements are important. This trade-off currently gains much intention in industrial applications, e.g., in [53]. The approaches to pruning or quantizing the models are state of the art as, e.g., described in [14,15].…”
Section: Related Workmentioning
confidence: 99%
“…Quantization and pruning, as explained above, are therefore essential for an efficient implementation. At the same time, the high throughput required by certain applications is very hard to achieve with general purpose hardware, such as CPUs and GPUs, because of limitations in both memory and computation bandwidth, which appeal more to the average case [ 51 ]. While several integrated custom accelerators exist, they cannot be easily reconfigured to compute with arbitrary precision, reducing the hardware utilization and therefore performance.…”
Section: Cnn and Quantization Backgroundmentioning
confidence: 99%
“…1 in different colors/patterns, and their overlap caused by 3x3 convolutions is marked in purple/crossed. FFMT was first employed for reducing peak memory usage in [9], but their path discovery [32] RAM reduction -Full Distributed Inference [30] RAM reduction ROM reduction Partly Manual Tiling [5,9] RAM reduction -Automated Tiling [6,10,19,[23][24][25][26] requires partially manual user effort. Other works that use FFMT with automated path discovery are [5,10,19,25,26].…”
Section: Related Workmentioning
confidence: 99%
“…FFMT was first employed for reducing peak memory usage in [9], but their path discovery [32] RAM reduction -Full Distributed Inference [30] RAM reduction ROM reduction Partly Manual Tiling [5,9] RAM reduction -Automated Tiling [6,10,19,[23][24][25][26] requires partially manual user effort. Other works that use FFMT with automated path discovery are [5,10,19,25,26]. FFMT along with tiling in the depthwise dimension for single layers without operator fusion was explored in [6,23].…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation