2020
DOI: 10.48550/arxiv.2010.05625
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Post-Training BatchNorm Recalibration

Gil Shomron,
Uri Weiser

Abstract: We revisit non-blocking simultaneous multithreading (NB-SMT) introduced previously by Shomron and Weiser [18]. NB-SMT trades accuracy for performance by occasionally "squeezing" more than one thread into a shared multiply-andaccumulate (MAC) unit. However, the method of accommodating more than one thread in a shared MAC unit may contribute noise to the computations, thereby changing the internal statistics of the model. We show that substantial model performance can be recouped by post-training recalibration o… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
2

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 18 publications
0
2
0
Order By: Relevance
“…[42] recomputes population variance to compensate for the "variance shift" caused by the inference mode of dropout. [64,31,69] recompute / recalibrate population statistics to compensate for the distribution shift caused by test-time quantization. Recomputing population statistics on a different domain [43] is common in domain adaptation, which will be discussed further in Sec.…”
Section: Discussion and Related Workmentioning
confidence: 99%
“…[42] recomputes population variance to compensate for the "variance shift" caused by the inference mode of dropout. [64,31,69] recompute / recalibrate population statistics to compensate for the distribution shift caused by test-time quantization. Recomputing population statistics on a different domain [43] is common in domain adaptation, which will be discussed further in Sec.…”
Section: Discussion and Related Workmentioning
confidence: 99%
“…The min-max statistics are gathered during a quick preprocessing stage on 2K randomly picked images from the training set. In addition, during preprocessing, we recalibrate the BatchNorm layers' running mean and running variance statistics [25,29,31,32]. In all models, the first convolution layer is left intact, since its input activations, which correspond to the image pixels, do not include many zero values, if any.…”
Section: Methodsmentioning
confidence: 99%