VACL: Variance-Aware Cross-Layer Regularization for Pruning Deep Residual Networks

Gao, Susan; Liu, Xin; Chien, Lung-Sheng; Zhang, William; Álvarez, Jose M.

doi:10.1109/iccvw.2019.00360

Cited by 11 publications

(5 citation statements)

References 33 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Lemaire et al [34] use a mixed block connectivity to avoid redundant computation, which can be treated as a subset of our method. Recently, a group pruning strategy [21], [22], [31], [35] is proposed to assign those layers into the same group, thus filters in the same group can be pruned simultaneously. Unfortunately, pruning in a group strategy requires that all connections of one pruned filter should be removed simultaneously, of which the constraint is so strong that limits the pruning performance at especially high pruning ratios.…”

Section: B Pruning Structurementioning

confidence: 99%

“…The biggest difference between our pruned residual neural network and existing methods is that we consider the problem of pruning from the perspective of gating feature maps, the gate function is induced on each channel while skip connections are always retained. One simple strategy is to prune each channel independently without any constraint among channels or layers, thus a better trade-off between compression ratio and performance should be achieved due to the larger search space of channels than strategies such as group pruning [21], [22] or skipping [19], [20]. However, too much freedom leads to irregular distributions of pruned channels between residual blocks, and during experiments we found the advantage of such strategy over group pruning or skipping is not significant as expected especially for more complex models and tasks.…”

Section: B Fine-grained Pruning For Residual Neural Networkmentioning

confidence: 99%

See 1 more Smart Citation

Acceleration-Aware Fine-Grained Channel Pruning for Deep Neural Networks via Residual Gating

Huang

Chen

et al. 2022

IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.

View full text Add to dashboard Cite

Deep Neural Networks have achieved remarkable advancement in various intelligence tasks. However, the massive computation and storage consumption limit applications on resource-constrained devices. While channel pruning has been widely applied to compress models, it is challenging to reach very deep compressions for such a coarse-grained pruning structure without significant performance degradation. In this article, we propose an acceleration-aware fine-grained channel pruning (AFCP) framework for accelerating neural networks, which optimizes trainable gate parameters by estimating residual errors between pruned and original channels with hardware characteristics. Our fine-grained concept consists of both algorithm and structure levels. Different from existing methods that leverage a pre-defined pruning criterion, AFCP explicitly considers both zero-out and similar criteria for each channel and adaptively selects the suitable one via residual gate parameters. For structure level, AFCP adopts a fine-grained channel pruning strategy for residual neural networks and a decomposition-based structure, which further extends the pruning optimization space. Moreover, instead of using theoretical computation costs such as FLOPs, we propose the hardware predictor that bridges the gap between realistic acceleration and pruning procedure to guide the learning of pruning, which improves the efficiency of model pruning when deployed on accelerators. Extensive evaluation results demonstrate that AFCP outperforms state-ofthe-art methods, and achieves a favorable balance between model performance and computation cost.

show abstract

Section: B Pruning Structurementioning

confidence: 99%

Section: B Fine-grained Pruning For Residual Neural Networkmentioning

confidence: 99%

Acceleration-Aware Fine-Grained Channel Pruning for Deep Neural Networks via Residual Gating

Huang

Chen

et al. 2022

IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.

View full text Add to dashboard Cite

show abstract

“…Since MobileNet has group convolutional layers to speedup the inference, we take the group convolutional layer with its preceding connected convolutional layer together as coupled crosslayers Gao et al (2019) to make sure the input channel number and output channel number of the group convolution remain the same. All the 27 convolutional layers can be divided into 14 coupled layers.…”

Section: B Efficacy Of Neuron Grouping On Mobilenetmentioning

confidence: 99%

HALP: Hardware-Aware Latency Pruning

Shen¹,

Yin²,

Molchanov³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Structural pruning can simplify network architecture and improve inference speed. We propose Hardware-Aware Latency Pruning (HALP) that formulates structural pruning as a global resource allocation optimization problem, aiming at maximizing the accuracy while constraining latency under a predefined budget. For filter importance ranking, HALP leverages latency lookup table to track latency reduction potential and global saliency score to gauge accuracy drop. Both metrics can be evaluated very efficiently during pruning, allowing us to reformulate global structural pruning under a reward maximization problem given target constraint. This makes the problem solvable via our augmented knapsack solver, enabling HALP to surpass prior work in pruning efficacy and accuracy-efficiency trade-off. We examine HALP on both classification and detection tasks, over varying networks, on ImageNet and VOC datasets. In particular, for ResNet-50/-101 pruning on ImageNet, HALP improves network throughput by 1.60×/1.90× with +0.3%/−0.2% top-1 accuracy changes, respectively. For SSD pruning on VOC, HALP improves throughput by 1.94× with only a 0.56 mAP drop. HALP consistently outperforms prior art, sometimes by large margins.

show abstract

“…Prune while training methods rest in the middle by finding a trade-off between training efficiency and final accuracy. Literature falls under two streams towards this task: a) regularization-based methods that encourage sparsity during training [3,13,28], and b) sub-ticket selection methods via saliency that discard redundancy [2,14,17]. Our work belongs to the latter given its efficacy to quickly enforce a pruning ratio and ease-of-control during training.…”

Section: Related Workmentioning

confidence: 99%

When to Prune? A Policy towards Early Structural Pruning

Shen¹,

Molchanov²,

Yin³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Pruning enables appealing reductions in network memory footprint and time complexity. Conventional posttraining pruning techniques lean towards efficient inference while overlooking the heavy computation for training. Recent exploration of pre-training pruning at initialization hints on training cost reduction via pruning, but suffers noticeable performance degradation. We attempt to combine the benefits of both directions and propose a policy that prunes as early as possible during training without hurting performance. Instead of pruning at initialization, our method exploits initial dense training for few epochs to quickly guide the architecture, while constantly evaluating dominant sub-networks via neuron importance ranking. This unveils dominant sub-networks whose structures turn stable, allowing conventional pruning to be pushed earlier into the training. To do this early, we further introduce an Early Pruning Indicator (EPI) that relies on sub-network architectural similarity and quickly triggers pruning when the sub-network's architecture stabilizes. Through extensive experiments on ImageNet, we show that EPI empowers a quick tracking of early training epochs suitable for pruning, offering same efficacy as an otherwise "oracle" grid-search that scans through epochs and requires orders of magnitude more compute. Our method yields 1.4% top-1 accuracy boost over state-of-the-art pruning counterparts, cuts down training cost on GPU by 2.4×, hence offers a new efficiency-accuracy boundary for network pruning during training.

show abstract

VACL: Variance-Aware Cross-Layer Regularization for Pruning Deep Residual Networks

Cited by 11 publications

References 33 publications

Acceleration-Aware Fine-Grained Channel Pruning for Deep Neural Networks via Residual Gating

Acceleration-Aware Fine-Grained Channel Pruning for Deep Neural Networks via Residual Gating

HALP: Hardware-Aware Latency Pruning

When to Prune? A Policy towards Early Structural Pruning

Contact Info

Product

Resources

About