2023
DOI: 10.1109/access.2023.3238715
|View full text |Cite
|
Sign up to set email alerts
|

Binarized Neural Network With Parameterized Weight Clipping and Quantization Gap Minimization for Online Knowledge Distillation

Abstract: As the applications for artificial intelligence are growing rapidly, numerous network compression algorithms have been developed to restrict computing resources such as smartphones, edge, and IoT devices. Knowledge distillation (KD) leverages soft labels derived from a teacher model to a less parameterized model achieving high accuracy with reduced computational burden. Moreover, online KD provides parallel computing through collaborative learning between teacher and student networks, thus enhancing the traini… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 22 publications
0
2
0
Order By: Relevance
“…Some researchers focused on extreme quantization, in which only binary or ternary weights and activations are involved [22], [23], [27]. These methods used bit-shift logic instead of high-precision multiplications to achieve a significant acceleration but often result in substantial performance degradation.…”
Section: Related Work a Network Quantizationmentioning
confidence: 99%
“…Some researchers focused on extreme quantization, in which only binary or ternary weights and activations are involved [22], [23], [27]. These methods used bit-shift logic instead of high-precision multiplications to achieve a significant acceleration but often result in substantial performance degradation.…”
Section: Related Work a Network Quantizationmentioning
confidence: 99%
“…The first approach involves amplifying the representational capability of each layer by expanding the diversity of cases that the parameters can represent [4]- [11]. The second method focuses on refining the gradient mismatch in the backward path [1], [12], [13]. By employing straight-through estimation (STE) [14] and scaling factors, Rastegari et al [2] demonstrated a notable expansion in the network representation and more accurate parameter updates.…”
Section: Introductionmentioning
confidence: 99%