Snapshot boosting: a fast ensemble framework for deep neural networks

Zhang, Wentao; Jiang, Jiawei; Shao, Yingxia; Cui, Bin

doi:10.1007/s11432-018-9944-x

Cited by 43 publications

(14 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The core of snapshot ensemble is an optimization process, which visits multiple local minima before converging to the final minimum. Saving the model parameters derived from these different local minima is equivalent to taking a snapshot of the model at these different local minima [24,25].…”

Section: Principles Of Snapshot Ensemblementioning

confidence: 99%

Network Intrusion Detection Algorithm Combined with Group Convolution Network and Snapshot Ensemble

Wang

Zhou

et al. 2021

Symmetry

View full text Add to dashboard Cite

In order to adapt to the rapid development of network technology and network security detection in different scenarios, the generalization ability of the classifier needs to be further improved and has the ability to detect unknown attacks. However, the generalization ability of a single classifier is limited to dealing with class imbalance, and the previous ensemble methods inevitably increase the training cost. Therefore, in this paper, a novel network intrusion detection algorithm combined with group convolution is proposed to improve the generalization performance of the model. The basic classifier uses group convolution with symmetric structure instead of ordinary convolution neural network, which is trained by the cyclic cosine annealing learning rate. Through snapshot ensemble, the generalization ability of the integration model is improved without increasing the training cost. The effectiveness of this method is proved on NSL-KDD and UNSW-NB15 datasets compared to six other ensemble methods, the classification accuracy can achieve 85.82% and 80.38%, respectively.

show abstract

Section: Principles Of Snapshot Ensemblementioning

confidence: 99%

Network Intrusion Detection Algorithm Combined with Group Convolution Network and Snapshot Ensemble

Wang

Zhou

et al. 2021

Symmetry

View full text Add to dashboard Cite

show abstract

“…Such methods are sometimes seen as regularization approaches and can work in coordination with our proposed method. Recently, checkpoint ensemble has become increasingly popular as it improves the predictors "for free" [Huang et al, 2017a, Zhang et al, 2020. It was termed as "Horizontal Voting" in [Xie et al, 2013], where the outputs of the checkpoints are straightforwardly ensembled as the final prediction.…”

Section: Related Workmentioning

confidence: 99%

“…Similarly, another method FGE (Fast Geometric Ensembling) [Garipov et al, 2018] copies a trained model and further fine-tunes it with a cyclical learning rate, saving checkpoints and ensembling them with the trained model. More recently, [Zhang et al, 2020] proposed the Snapshot Boosting, where they modified the learning rate restarting rules and set different sample weights during each training stage to further enhance the diversity of checkpoints. Although there are weights of training samples in Snapshot Boosting, these weights are updated only after the learning rate is restarted and each update begins from the initialization.…”

Section: Related Workmentioning

confidence: 99%

“…Despite its merit of no additional training computation, an obvious problem of CPE is that the checkpoints sampled from one training process are often very similar, which violates the consensus that we desire the base models in an ensemble are accurate but sufficiently diverse. To enhance the diversity, conventional ensembles often differ the base models from initialization, objective function, or hyperparameters [Breiman, 1996, Freund and Schapire, 1997, Zhou et al, 2002, whilst the recent CPE methods try to achieve this goal by cyclical learning rate scheduling, with the assumption that the high learning rates are able to force the model jumping out of the current local optima and visiting others [Huang et al, 2017a, Garipov et al, 2018, Zhang et al, 2020. However, this effect is not theoretically promised and the cyclically high learning rates may incur oscillation or even divergence when training budget is limited.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Boost Neural Networks by Checkpoints

Wang¹,

Wei²,

Qiao³

et al. 2021

Preprint

View full text Add to dashboard Cite

Training multiple deep neural networks (DNNs) and averaging their outputs is a simple way to improve the predictive performance. Nevertheless, the multiplied training cost prevents this ensemble method to be practical and efficient. Several recent works attempt to save and ensemble the checkpoints of DNNs, which only requires the same computational cost as training a single network. However, these methods suffer from either marginal accuracy improvements due to the low diversity of checkpoints or high risk of divergence due to the cyclical learning rates they adopted. In this paper, we propose a novel method to ensemble the checkpoints, where a boosting scheme is utilized to accelerate model convergence and maximize the checkpoint diversity. We theoretically prove that it converges by reducing exponential loss. The empirical evaluation also indicates our proposed ensemble outperforms single model and existing ensembles in terms of accuracy and efficiency. With the same training budget, our method achieves 4.16% lower error on Cifar-100 and 6.96% on Tiny-ImageNet with ResNet-110 architecture. Moreover, the adaptive sample weights in our method make it an effective solution to address the imbalanced class distribution. In the experiments, it yields up to 5.02% higher accuracy over single EfficientNet-B0 on the imbalanced datasets.

show abstract

“…Meanwhile, to improve self-supervising quality in the distilled knowledge, we propose to generate a more powerful teacher by ensembling multi-scale knowledge from students. The diversity, which is a key factor in constructing an effective ensemble teacher [33,34], is enhanced by optimizing individual reception-aware graph knowledge for each student. For better selfsupervising quantity, each student can get sufficient supervision from the reception-aware graph knowledge, task-specific knowledge, and the rich distilled knowledge (soft label) from a powerful teacher model.…”

Section: Introductionmentioning

confidence: 99%

Rod

Zhang

Jiang

et al. 2021

Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery &Amp; Data Mining

Self Cite

View full text Add to dashboard Cite

Graph neural networks (GNNs) have been widely used in many graph-based tasks such as node classification, link prediction, and node clustering. However, GNNs gain their performance benefits mainly from performing the feature propagation and smoothing across the edges of the graph, thus requiring sufficient connectivity and label information for effective propagation. Unfortunately, many real-world networks are sparse in terms of both edges and labels, leading to sub-optimal performance of GNNs. Recent interest in this sparse problem has focused on the self-training approach, which expands supervised signals with pseudo labels. Nevertheless, the self-training approach inherently cannot realize the full potential of refining the learning performance on sparse graphs due to the unsatisfactory quality and quantity of pseudo labels.In this paper, we propose ROD, a novel reception-aware online knowledge distillation approach for sparse graph learning. We design three supervision signals for ROD: multi-scale reception-aware graph knowledge, task-based supervision, and rich distilled knowledge, allowing online knowledge transfer in a peer-teaching manner. To extract knowledge concealed in the multi-scale reception fields, ROD explicitly requires individual student models to preserve different levels of locality information. For a given task, each student would predict based on its reception-scale knowledge, while simultaneously a strong teacher is established on-the-fly by combining multi-scale knowledge. Our approach has been extensively evaluated on 9 datasets and a variety of graph-based tasks, including node classification, link prediction, and node clustering. The result demonstrates that ROD achieves state-of-art performance and is more robust for the graph sparsity. CCS CONCEPTS• Mathematics of computing → Graph algorithms; • Computing methodologies → Neural networks.

show abstract

Snapshot boosting: a fast ensemble framework for deep neural networks

Cited by 43 publications

References 22 publications

Network Intrusion Detection Algorithm Combined with Group Convolution Network and Snapshot Ensemble

Network Intrusion Detection Algorithm Combined with Group Convolution Network and Snapshot Ensemble

Boost Neural Networks by Checkpoints

Rod

Contact Info

Product

Resources

About