2023
DOI: 10.1007/s11227-023-05050-4
|View full text |Cite
|
Sign up to set email alerts
|

Performance–energy trade-offs of deep learning convolution algorithms on ARM processors

Abstract: In this work, we assess the performance and energy efficiency of high-performance codes for the convolution operator, based on the direct, explicit/implicit lowering and Winograd algorithms used for deep learning (DL) inference on a series of ARM-based processor architectures. Specifically, we evaluate the NVIDIA Denver2 and Carmel processors, as well as the ARM Cortex-A57 and Cortex-A78AE CPUs as part of a recent set of NVIDIA Jetson platforms. The performance–energy evaluation is carried out using the ResNet… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
3
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 24 publications
0
3
0
Order By: Relevance
“…2) Convolution operation: Fixed iteration count loops are a key mechanism for implementing the convolution operation [21] in convolutional neural networks (CNNs) [22]- [23]. It involves looping over the elements of the input feature map and the convolution kernel, performing multiplication and accumulation operations to generate the output feature map.…”
Section: Application Of Fixed Iteration Count Loopsmentioning
confidence: 99%
“…2) Convolution operation: Fixed iteration count loops are a key mechanism for implementing the convolution operation [21] in convolutional neural networks (CNNs) [22]- [23]. It involves looping over the elements of the input feature map and the convolution kernel, performing multiplication and accumulation operations to generate the output feature map.…”
Section: Application Of Fixed Iteration Count Loopsmentioning
confidence: 99%
“…However, modern convolutional and capsule neural networks use small filters more often than the traditionally used large filters computed using the FFT approach. The Winograd's minimal filtering algorithm [1,[12][13][14][15], which has recently gained significant popularity, is widely regarded as well-suited for such scenarios. This approach exhibits enhanced efficiency, specifically when employing small filters and tile sizes.…”
Section: Introductionmentioning
confidence: 99%
“…Similarly, Qasaimeh et al [43] compared the performance of three hardware accelerators for embedded vision applications. Also, the performance of ARM processors on Deep Learning was investigated by Dolz et al [44]. ARM Cortex-A57 and Cortex-A78AE CPUs were studied among other processors.…”
mentioning
confidence: 99%