Building Efficient Deep Neural Networks With Unitary Group Convolutions

Zhao, Ritchie; Hu, Yuwei; Dotzel, Jordan; De, Christopher; Zhang, Zhiru

doi:10.1109/cvpr.2019.01156

Cited by 41 publications

(55 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For ResNet-18, we can reduce 83% parameters and 95% FLOPS with an increase of 14.5% in test error. It is not promising regarding the high test error, but this performance is on par with [50], even if they have higher budget in both training and inference phases: their ResNet-18 example is trained from scratch and the Hadamard transform adds overhead. We also compare ResNet-34 to [32], which uses lowrank approximation to perform GConv pruning.…”

Section: Resultsmentioning

confidence: 99%

“…Additionally, it cannot deal with 1 × 1 group convolution, which is critical since recent efficient CNN heavily rely on them [10,36,49]. [50] applies block Hadamard transform, which is more efficient but still requires extra computation. On the other hand, permuting channels is a much simpler way to mingle groups ( Figure 3b) since neither additional FMA nor parameter is required.…”

Section: Background and Related Workmentioning

confidence: 99%

“…To construct a CNN by GConv, one can build and train from scratch [49,15,44,50,42], or prune from pre-trained models. CondenseNet [12] prunes by a multi-stage, fromscratch training and regularisation procedure.…”

Section: Background and Related Workmentioning

confidence: 99%

“…Meanwhile, the number of groups should be determined for all layers, which is not a trivial procedure as well. These two problems have not been properly resolved by prior works [50,32,12] yet: they may require training from scratch, manually determining group configuration, or adding overhead during inference. There is still room for improvement.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Efficient Structured Pruning and Architecture Searching for Group Convolution

Zhao

Luk

2019

2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW)

View full text Add to dashboard Cite

Efficient inference of Convolutional Neural Networks is a thriving topic recently. It is desirable to achieve the maximal test accuracy under given inference budget constraints when deploying a pre-trained model. Network pruning is a commonly used technique while it may produce irregular sparse models that can hardly gain actual speed-up. Group convolution is a promising pruning target due to its regular structure; however, incorporating such structure into the pruning procedure is challenging. It is because structural constraints are hard to describe and can make pruning intractable to solve. The need for configuring group convolution architecture, i.e., the number of groups, that maximises test accuracy also increases difficulty.This paper presents an efficient method to address this challenge. We formulate group convolution pruning as finding the optimal channel permutation to impose structural constraints and solve it efficiently by heuristics. We also apply local search to exploring group configuration based on estimated pruning cost to maximise test accuracy. Compared to prior work, results show that our method produces competitive group convolution models for various tasks within a shorter pruning period and enables rapid group configuration exploration subject to inference budget constraints.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Background and Related Workmentioning

confidence: 99%

Section: Background and Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Efficient Structured Pruning and Architecture Searching for Group Convolution

Zhao

Luk

2019

2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW)

View full text Add to dashboard Cite

show abstract

“…The second direction is to enhance the hardware implementation efficiency by deriving an effective tradeoff between accuracy and pruning rate, e.g., the energyaware pruning [17], and structure-aware pruning [18], [10]. FPGA hardware accelerators [19], [20] have also been investigated to accommodate pruned CNNs, by leveraging the reconfigurability in on-chip resources. Recently, the authors of [14] have developed a systematic weight pruning framework based on the powerful optimization tool ADMM (Alternating Direction Method of Multipliers) [21].…”

Section: B Cnn Weight Pruningmentioning

confidence: 99%

SPEC2: SPECtral SParsE CNN Accelerator on FPGAs

Niu

Zeng

Srivastava

et al. 2019

2019 IEEE 26th International Conference on High Performance Computing, Data, and Analytics (HiPC)

View full text Add to dashboard Cite

To accelerate inference of Convolutional Neural Networks (CNNs), various techniques have been proposed to reduce computation redundancy. Converting convolutional layers into frequency domain significantly reduces the computation complexity of the sliding window operations in space domain. On the other hand, weight pruning techniques address the redundancy in model parameters by converting dense convolutional kernels into sparse ones. To obtain high-throughput FPGA implementation, we propose SPEC 2 -the first work to prune and accelerate spectral CNNs. First, we propose a systematic pruning algorithm based on Alternative Direction Method of Multipliers (ADMM). The offline pruning iteratively sets the majority of spectral weights to zero, without using any handcrafted heuristics. Then, we design an optimized pipeline architecture on FPGA that has efficient random access into the sparse kernels and exploits various dimensions of parallelism in convolutional layers. Overall, SPEC 2 achieves high inference throughput with extremely low computation complexity and negligible accuracy degradation. We demonstrate SPEC 2 by pruning and implementing LeNet and VGG16 on the Xilinx Virtex platform. After pruning 75% of the spectral weights, SPEC 2 achieves 0% accuracy loss for LeNet, and < 1% accuracy loss for VGG16. The resulting accelerators achieve up to 24× higher throughput, compared with the stateof-the-art FPGA implementations for VGG16.

show abstract

One Weight Bitwidth to Rule Them All

Chin

Chuang

Chandra

et al. 2020

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Weight quantization for deep ConvNets has shown promising results for applications such as image classification and semantic segmentation and is especially important for applications where memory storage is limited. However, when aiming for quantization without accuracy degradation, different tasks may end up with different bitwidths. This creates complexity for software and hardware support and the complexity accumulates when one considers mixed-precision quantization, in which case each layer's weights use a different bitwidth. Our key insight is that optimizing for the least bitwidth subject to no accuracy degradation is not necessarily an optimal strategy. This is because one cannot decide optimality between two bitwidths if one has smaller model size while the other has better accuracy. In this work, we take the first step to understand if some weight bitwidth is better than others by aligning all to the same model size using a width-multiplier. Under this setting, somewhat surprisingly, we show that using a single bitwidth for the whole network can achieve better accuracy compared to mixed-precision quantization targeting zero accuracy degradation when both have the same model size. In particular, our results suggest that when the number of channels becomes a target hyperparameter, a single weight bitwidth throughout the network shows superior results for model compression.

show abstract

Building Efficient Deep Neural Networks With Unitary Group Convolutions

Cited by 41 publications

References 22 publications

Efficient Structured Pruning and Architecture Searching for Group Convolution

Efficient Structured Pruning and Architecture Searching for Group Convolution

SPEC2: SPECtral SParsE CNN Accelerator on FPGAs

One Weight Bitwidth to Rule Them All

Contact Info

Product

Resources

About