Improving the Accuracy of Early Exits in Multi-Exit Architectures via Curriculum Learning

Bakhtiarnia, Arian; Zhang, Qi; Iosifidis, Alexandros

doi:10.1109/ijcnn52387.2021.9533875

Cited by 9 publications

(12 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Curriculum learning is very sensitive to the choice of scoring and pacing functions and their hyper-parameters [18]. It should be noted that as opposed to human learning, sometimes the opposite approach of starting the training from the hardest examples, called anti-curriculum, works best for DNNs [18], [19].…”

Section: Curriculum Learningmentioning

confidence: 99%

“…First, in our method, each iteration contains only images of a particular difficulty, whereas typically there are a mixture of difficulties in each iteration. Second, the pacing of curriculum learning is usually much faster, and the most difficult examples are introduced after only a handful of epochs [18], [19], whereas in our method,…”

Section: Curriculum Pre-trainingmentioning

confidence: 99%

See 1 more Smart Citation

Crowd Counting on Heavily Compressed Images with Curriculum Pre-Training

Bakhtiarnia¹,

Zhang²,

Iosifidis³

2022

Preprint

Self Cite

View full text Add to dashboard Cite

JPEG image compression algorithm is a widely used technique for image size reduction in edge and cloud computing settings. However, applying such lossy compression on images processed by deep neural networks can lead to significant accuracy degradation. Inspired by the curriculum learning paradigm, we present a novel training approach called curriculum pre-training (CPT) for crowd counting on compressed images, which alleviates the drop in accuracy resulting from lossy compression. We verify the effectiveness of our approach by extensive experiments on three crowd counting datasets, two crowd counting DNN models and various levels of compression. Our proposed training method is not overly sensitive to hyperparameters, and reduces the error, particularly for heavily compressed images, by up to 19.70%.

show abstract

Section: Curriculum Learningmentioning

confidence: 99%

Section: Curriculum Pre-trainingmentioning

confidence: 99%

Crowd Counting on Heavily Compressed Images with Curriculum Pre-Training

Bakhtiarnia¹,

Zhang²,

Iosifidis³

2022

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…However, since there are multiple outputs, and thus multiple loss signals in a multi-exit architecture, its training is not as straightforward. Three different approaches for training multi-exit architectures exist in the literature [12], [15], [18]. In the first approach, called end-to-end training, the loss signals of all exits are combined and backpropagated through the network at the same time.…”

Section: A Multi-exit Architecturesmentioning

confidence: 99%

“…Various deep learning models for audio classification exist in the literature, including models that are commonly used for image classification, namely ResNet [25], DenseNet [26] and Inception [27], which have been shown to be quite effective for audio classification as well [28]. Conveniently, the same three networks have previously been used as backbone networks when investigating early exiting for image classification [15]. Therefore we use these backbone networks for both image and audio classification in our experiments.…”

Section: Audio Classificationmentioning

confidence: 99%

“…However, the low-overhead constraint makes it quite challenging to achieve a high accuracy since the early exit branches have significantly less trainable parameters compared to the rest of the network. Several approaches for increasing the accuracy of early exits such as knowledge distillation [14] and curriculum learning [15] have been suggested. In this paper, we propose a novel architecture in order to obtain more accurate early exits.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Single-Layer Vision Transformers for More Accurate Early Exits with Less Overhead

Bakhtiarnia¹,

Zhang²,

Iosifidis³

2021

Preprint

Self Cite

View full text Add to dashboard Cite

Deploying deep learning models in time-critical applications with limited computational resources, for instance in edge computing systems and IoT networks, is a challenging task that often relies on dynamic inference methods such as early exiting. In this paper, we introduce a novel architecture for early exiting based on the vision transformer architecture, as well as a fine-tuning strategy that significantly increase the accuracy of early exit branches compared to conventional approaches while introducing less overhead. Through extensive experiments on image and audio classification as well as audiovisual crowd counting, we show that our method works for both classification and regression problems, and in both single-and multi-modal settings. Additionally, we introduce a novel method for integrating audio and visual modalities within early exits in audiovisual data analysis, that can lead to a more fine-grained dynamic inference.

show abstract