2021
DOI: 10.48550/arxiv.2111.10988
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Local-Selective Feature Distillation for Single Image Super-Resolution

Abstract: Recent improvements in convolutional neural network (CNN)-based single image super-resolution (SISR) methods rely heavily on fabricating network architectures, rather than finding a suitable training algorithm other than simply minimizing the regression loss. Adapting knowledge distillation (KD) can open a way for bringing further improvement for SISR, and it is also beneficial in terms of model efficiency. KD is a model compression method that improves the performance of Deep Neural Networks (DNNs) without us… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
16
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(16 citation statements)
references
References 43 publications
0
16
0
Order By: Relevance
“…However, in generative tasks that produce an image (such as super-resolution and demosaicing), directly applying distillation techniques is challenging since they do not have a notion of soft labels. Consequently, many super-resolution networks distill teacher networks' features [55], [56], [57] using an additional tool like a regressor to address dimension mismatches between the teacher and student. Gao et al [55] transfer the first-order statistical map (e.g., average, maximum, or minimum value) of intermediate features, while FAKD [56] proposed spatial affinity of features-based distillation to utilize rich high-dimensional statistical information.…”
Section: Knowledge Distillationmentioning
confidence: 99%
See 2 more Smart Citations
“…However, in generative tasks that produce an image (such as super-resolution and demosaicing), directly applying distillation techniques is challenging since they do not have a notion of soft labels. Consequently, many super-resolution networks distill teacher networks' features [55], [56], [57] using an additional tool like a regressor to address dimension mismatches between the teacher and student. Gao et al [55] transfer the first-order statistical map (e.g., average, maximum, or minimum value) of intermediate features, while FAKD [56] proposed spatial affinity of features-based distillation to utilize rich high-dimensional statistical information.…”
Section: Knowledge Distillationmentioning
confidence: 99%
“…Gao et al [55] transfer the first-order statistical map (e.g., average, maximum, or minimum value) of intermediate features, while FAKD [56] proposed spatial affinity of features-based distillation to utilize rich high-dimensional statistical information. LSFD [57] introduced a deeper regressor comprising 3 × 3 convolution layers to achieve a larger receptive field and an attention method based on the difference (between teacher and student) that selectively focuses on vulnerable pixel locations. PISR [58] added an encoder that leverages privileged information from ground truth and transfers knowledge through feature distillation.…”
Section: Knowledge Distillationmentioning
confidence: 99%
See 1 more Smart Citation
“…Knowledge distillation transfers rich knowledge from teacher models to lightweight student models in the label or feature domains, mainly applied in model compression and acceleration [15,42]. Among various approaches, featurebased knowledge distillation is extensively applied in image restoration tasks such as image super-resolution [14,16,27,38]. It minimizes the distance between feature representations to learn richer information from the teacher model compared to softened labels [49].…”
Section: Knowledge Distillationmentioning
confidence: 99%
“…The practical applicability of complex SISR models is limited in resource-constrained devices, such as mobile or IoT devices; thus, efficient and lightweight SISR models are required. To satisfy this demand, lightweight SISR models with a better trade-off between efficiency and performance quality have been proposed [16], [17], [19]- [25]. Among the aforementioned methods, knowledge distillation (KD) [26]based approaches exhibit the following distinctive advantages: 1) KD promotes the inheritance of the knowledge of large teacher networks and improves performance without modifying the existing network structure at industrial sites, and 2) KD can be combined with pruning and network design methods by including additional loss terms to achieve greater performance improvement [27], [28].…”
Section: Introductionmentioning
confidence: 99%