The explosive computation and memory requirements of convolutional neural networks (CNNs) hinder their deployment in resource-constrained devices. Because conventional CNNs perform identical parallelized computations even on redundant pixels, the saliency of various features in an image should be reflected for higher energy efficiency and market penetration. This paper proposes a novel channel and spatial gating network (CSGN) for adaptively selecting vital channels and generating spatial-wise execution masks. A CSGN can be characterized as a dynamic channel and a spatial-aware gating module by maximally utilizing opportunistic sparsity. Extensive experiments were conducted on the CIFAR-10 and ImageNet datasets based on ResNet. The results revealed that, with the proposed architecture, the amount of multiply-accumulate (MAC) operations was reduced by 1.97–11.78× and 1.37–13.12× on CIFAR-10 and ImageNet, respectively, with negligible accuracy degradation in the inference stage compared with the baseline architectures.
Many deep learning frameworks utilize GEneral Matrix Multiplication (GEMM)-based convolution to accelerate CNN execution. GEMM-based convolution provides faster convolution yet requires a data conversion process called lowering (i.e., im2col), which incurs significant memory overhead and diminishes performance. This paper proposes a novel hardware mechanism, called On-the-fly Lowering Engine (OLE), to eliminate the lowering overheads. Our goal is to offload the lowering overheads from the GEMMbased convolution. With OLE, the lowered matrix is neither pre-calculated nor stored in the main memory. Instead, a hardware engine generates lowered matrix on-the-fly from the original input matrix to reduce memory footprint and bandwidth requirements. Furthermore, the hardware offloading eliminates CPU cycles for lowering operation and overlaps computation with lowering to hide the performance overhead. Our evaluation shows that OLE can reduce memory footprint of convolutional layer inputs down to 1 12.5 × and the overall memory footprint by up to 33.5%. Moreover, OLE can reduce the execution time of convolutional layers by 57.7% on average, resulting in an average speedup of 2.3× for representative CNN models.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.