Multilevel Attention-Based Sample Correlations for Knowledge Distillation

Gou, Jianping; Sun, Liyuan; Yu, Baosheng; Wan, Shaohua; Ou, Weihua; Yi, Zhang

doi:10.1109/tii.2022.3209672

Cited by 59 publications

(9 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In this section, we first compared CSKD with some previous methods, including KD, 12 FitNet, 34 AT, 35 SP, 36 VID, 38 HKD, 39 MGD, 41 CRD, 40 virtual knowledge distillation (VKD), 42 curriculum expert selection for knowledge distillation (CESKD), 43 and multilevel attention-based sample correlations for knowledge distillation (MASCKD), 44 on benchmark datasets to verify its effectiveness. Then, we conducted representational transferability experiments to evaluate the quality of representations learned by the student network.…”

Section: Methodsmentioning

confidence: 99%

Cross-stage distillation with inverted bottleneck projectors

Sun,

Liu,

Zhang

et al. 2023

J. Electron. Imag.

View full text Add to dashboard Cite

Knowledge distillation, which transfers knowledge from a large model (the teacher) to a small model (the student), is a promising way in the field of lightweight model design. Existing distillation methods based on intermediate features mainly focus on knowledge transfer between the same stages of teacher and student networks, which may lead to student networks receiving semantically mismatched knowledge and missing contextual knowledge. To solve this problem, a cross-stage distillation with inverted bottleneck projectors is proposed. A cross-stage connection structure is designed that enables the student network to access the teacher network's most matched stages in semantics and further obtain rich contextual knowledge from multiple stages of the teacher. After establishing the cross-stage connection between the teacher network and the student network, inverted bottleneck projectors are employed to extract useful knowledge from multiple stages of the teacher. A stack structure with three-layer convolutions is proposed in every projector, which can make the knowledge of the teacher network more easily understood by the student network. The inverted bottleneck structure can reduce information loss, ensuring the integrity of knowledge transmission. Further, ReLU activations are incorporated into the projectors to remove features with lower response values, thereby filtering redundant elements. Extensive experiments on CIFAR100, ImageNet, Tiny-ImageNet, and STL-10 datasets demonstrate the effectiveness of the proposed approach.

show abstract

Section: Methodsmentioning

confidence: 99%

Cross-stage distillation with inverted bottleneck projectors

Sun,

Liu,

Zhang

et al. 2023

J. Electron. Imag.

View full text Add to dashboard Cite

show abstract

“…By setting the derivative of objective function in (25) with respect to a (k) to zero, the optimal solution of problem ( 25) is as follows:…”

Section: 24mentioning

confidence: 99%

“…A robust unsupervised spectral feature selection model [23] incorporated the graph matrix construction and the feature selection into the process of data mining. In addition, the sample correlation is also conducive to the improvement of learning performance [24][25][26]. The most classical method is a low-rank representation (LRR) [27,28], which can capture the globality and correlation of all training data.…”

Section: Introductionmentioning

confidence: 99%

Non‐negative low‐rank adaptive preserving sparse matrix regression model for supervised image feature selection and classification

Chen

Zhu

et al. 2023

IET Image Processing

View full text Add to dashboard Cite

The sparse matrix regression (SMR) model for the feature selection method has attracted much attention. However, most existing models do not consider the globality and adaptively preserve the local structure of the image data in projection space. To settle such issues, an adaptive non‐negative low‐rank preserving SMR model for supervised image feature selection is proposed. It first uses the low‐rank representation with non‐negative constraint to capture the globality and more discriminative information of image data and makes the error matrix in self‐representation of training data sparse. Next, the non‐negative low‐rank representation coefficients are used to establish a graph matrix learning model to reveal the local manifold structure of the image data. Thus, the proposed model enhances the discriminative ability as well as performs feature selection by the obtained transformation matrix. Finally, an alternating iterative algorithm for solving this model is developed and its convergence and complexity are also analyzed. Experimental results on some image data sets show that the proposed algorithm is effective for images and its recognition ability is obviously superior to other existing methods. In addition, the proposed method is also applied to two scene image classifications to further verify its effectiveness.

show abstract

“…Since the imaging process is mapping from many to one, it would lead to the uncertainties and inaccuracy from image itself. Although more and more data-driven methods can used to analyse this mapping, it would need a lot of training data [6,7]. Here, we use the fuzzy sets to elaborate the uncertain information of the image [8].…”

Section: Introductionmentioning

confidence: 99%

Adaptive fractional differential algorithm for image edge enhancement and texture preserve using fuzzy sets

Li¹,

Xie

Zhang

et al. 2023

IET Image Processing

View full text Add to dashboard Cite

This paper uses a fuzzy set scheme to present an adaptive fractional differential algorithm for image edge enhancement and texture preservation. In the proposed algorithm, an image's membership function and area feature are used to calculate the fuzzy set of images. The function of adaptive fractional differential order (FAFDO) can be constructed by making the linear transformation of the fuzzy set. Then, the fuzzy adaptive fractional differential mask (FAFDM) is obtained by substituting the FAFDO into the fractional differential mask. Finally, the image edge and texture are enhanced and preserved by applying airspace filtering of the FAFDM convolution. The experimental results show that, compared to fractional differential or fuzzy set‐based image enhancement algorithms, the proposed algorithm can adaptively enhance the image edge and preserve the image texture by analysing the fuzziness of the image itself.

show abstract

Multilevel Attention-Based Sample Correlations for Knowledge Distillation

Cited by 59 publications

References 26 publications

Cross-stage distillation with inverted bottleneck projectors

Cross-stage distillation with inverted bottleneck projectors

Non‐negative low‐rank adaptive preserving sparse matrix regression model for supervised image feature selection and classification

Adaptive fractional differential algorithm for image edge enhancement and texture preserve using fuzzy sets

Contact Info

Product

Resources

About