Adversarial Network Compression

Belagiannis, Vasileios; Farshad, Azade; Galasso, Fabio

doi:10.1007/978-3-030-11018-5_37

Cited by 45 publications

(39 citation statements)

References 57 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Many papers extended this approach in different directions, such as disentangling semantic concepts [37], network compression [38] [39] [40], feature augmentation [41], image to image translation [42], and explored different losses [43] and other tricks to improve performance and stability [44] [45]. Our work relates to this body of work, as the hallucination network of our model tries to generate features from the missing modality feature space through adversarial learning.…”

Section: Adversarial Learningmentioning

confidence: 99%

“…For that we need a student-teacher adversarial framework. This has an interesting parallel in adversarial network compression [38], where the performance of a fully supervised small network can be boosted by adversarial training against a high-capacity (and better performing) teacher net. In [38], it is also observed that the student can surpass the teacher in some occasions.…”

Section: Standard Supervised Learning Has Limitations In Extracting Imentioning

confidence: 99%

See 1 more Smart Citation

Learning with Privileged Information via Adversarial Discriminative Modality Distillation

Garcia

Morerio

Murino

2020

IEEE Trans. Pattern Anal. Mach. Intell.

View full text Add to dashboard Cite

Heterogeneous data modalities can provide complementary cues for several tasks, usually leading to more robust algorithms and better performance. However, while training data can be accurately collected to include a variety of sensory modalities, it is often the case that not all of them are available in real life (testing) scenarios, where a model has to be deployed. This raises the challenge of how to extract information from multimodal data in the training stage, in a form that can be exploited at test time, considering limitations such as noisy or missing modalities. This paper presents a new approach in this direction for RGB-D vision tasks, developed within the adversarial learning and privileged information frameworks. We consider the practical case of learning representations from depth and RGB videos, while relying only on RGB data at test time. We propose a new approach to train a hallucination network that learns to distill depth information via adversarial learning, resulting in a clean approach without several losses to balance or hyperparameters. We report state-of-the-art results for object classification on the NYUD dataset, and video action recognition on the largest multimodal dataset available for this task, the NTU RGB+D, as well as on the Northwestern-UCLA.

show abstract

Section: Adversarial Learningmentioning

confidence: 99%

Section: Standard Supervised Learning Has Limitations In Extracting Imentioning

confidence: 99%

Learning with Privileged Information via Adversarial Discriminative Modality Distillation

Garcia

Morerio

Murino

2020

IEEE Trans. Pattern Anal. Mach. Intell.

View full text Add to dashboard Cite

show abstract

“…Knowledge Distillation: Knowledge distillation (Ba and Caruana 2014) is used to transfer knowledge from teacher network to student network by the output before the softmax function (logits) or after it (soft targets), which has been popularized by (Hinton, Vinyals, and Dean 2015). As it is hard for student network with small capacity to mimic the outputs of teacher network, several researches (Belagiannis, Farshad, and Galasso 2018;Xu, Hsu, and Huang 2018) focused on using adversarial networks to replace the manually designed metric such as L1/L2 loss or KL divergence.…”

Section: Related Workmentioning

confidence: 99%

“…In recent years, many researchers resort to process-oriented methods, and many kinds of knowledge representation algorithms have been proposed (Zagoruyko and Komodakis 2016;Yim et al 2017). Empirically, the loss learned by adversarial training usually has advantages over the predetermined one in the student-teacher strategy, (Belagiannis, Farshad, and Galasso 2018) and (Xu, Hsu, and Huang 2018) proposed the GAN-based distillation approaches by introducing the discriminator to match the output distribution between teacher and student.…”

Section: Introductionmentioning

confidence: 99%

Hierarchical Knowledge Squeezed Adversarial Network Compression

Shu¹,

Xie

Qu³

et al. 2020

AAAI

View full text Add to dashboard Cite

Deep network compression has been achieved notable progress via knowledge distillation, where a teacher-student learning manner is adopted by using predetermined loss. Recently, more focuses have been transferred to employ the adversarial training to minimize the discrepancy between distributions of output from two networks. However, they always emphasize on result-oriented learning while neglecting the scheme of process-oriented learning, leading to the loss of rich information contained in the whole network pipeline. Whereas in other (non GAN-based) process-oriented methods, the knowledge have usually been transferred in a redundant manner. Observing that, the small network can not perfectly mimic a large one due to the huge gap of network scale, we propose a knowledge transfer method, involving effective intermediate supervision, under the adversarial training framework to learn the student network. Different from the other intermediate supervision methods, we design the knowledge representation in a compact form by introducing a task-driven attention mechanism. Meanwhile, to improve the representation capability of the attention-based method, a hierarchical structure is utilized so that powerful but highly squeezed knowledge is realized and the knowledge from teacher network could accommodate the size of student network. Extensive experimental results on three typical benchmark datasets, i.e., CIFAR-10, CIFAR-100, and ImageNet, demonstrate that our method achieves highly superior performances against state-of-the-art methods.

show abstract

“…GANs are used in a lot of diverse applications in which generative models are involved. These include learning of data representations [19], semantic segmentation [20], teacher-student network compression [21], defending adversarial examples [22], [23], [24], and reinforcement learning [25]. The generation of training and validation material for autonomous driving systems is another use case of generative models.…”

Section: A Generative Adversarial Networkmentioning

confidence: 99%

On Low-Bitrate Image Compression for Distributed Automotive Perception: Higher Peak SNR Does Not Mean Better Semantic Segmentation

Löhdefink

Bär

Schmidt

et al. 2019

2019 IEEE Intelligent Vehicles Symposium (IV)

View full text Add to dashboard Cite

The high amount of sensors required for autonomous driving poses enormous challenges on the capacity of automotive bus systems. There is a need to understand tradeoffs between bitrate and perception performance. In this paper, we compare the image compression standards JPEG, JPEG2000, and WebP to a modern encoder/decoder image compression approach based on generative adversarial networks (GANs). We evaluate both the pure compression performance using typical metrics such as peak signal-to-noise ratio (PSNR), structural similarity (SSIM) and others, but also the performance of a subsequent perception function, namely a semantic segmentation (characterized by the mean intersection over union (mIoU) measure). Not surprisingly, for all investigated compression methods, a higher bitrate means better results in all investigated quality metrics. Interestingly, however, we show that the semantic segmentation mIoU of the GAN autoencoder in the highly relevant low-bitrate regime (at 0.0625 bit/pixel) is better by 3.9 % absolute than JPEG2000, although the latter still is considerably better in terms of PSNR (5.91 dB difference). This effect can greatly be enlarged by training the semantic segmentation model with images originating from the decoder, so that the mIoU using the segmentation model trained by GAN reconstructions exceeds the use of the model trained with original images by almost 20 % absolute. We conclude that distributed perception in future autonomous driving will most probably not provide a solution to the automotive bus capacity bottleneck by using standard compression schemes such as JPEG2000, but requires modern coding approaches, with the GAN encoder/decoder method being a promising candidate.

show abstract

Adversarial Network Compression

Cited by 45 publications

References 57 publications

Learning with Privileged Information via Adversarial Discriminative Modality Distillation

Learning with Privileged Information via Adversarial Discriminative Modality Distillation

Hierarchical Knowledge Squeezed Adversarial Network Compression

On Low-Bitrate Image Compression for Distributed Automotive Perception: Higher Peak SNR Does Not Mean Better Semantic Segmentation

Contact Info

Product

Resources

About