2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2016
DOI: 10.1109/cvpr.2016.514
|View full text |Cite
|
Sign up to set email alerts
|

DisturbLabel: Regularizing CNN on the Loss Layer

Abstract: During a long period of time we are combating overfitting in the CNN training process with model regularization, including weight decay, model averaging, data augmentation, etc. In this paper, we present DisturbLabel, an extremely simple algorithm which randomly replaces a part of labels as incorrect values in each iteration. Although it seems weird to intentionally generate incorrect training labels, we show that DisturbLabel prevents the network training from over-fitting by implicitly averaging over exponen… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
158
0

Year Published

2016
2016
2022
2022

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 202 publications
(159 citation statements)
references
References 22 publications
1
158
0
Order By: Relevance
“…For the cleanliness test, we replaced the labes to a random incorrect one for 5%, 10%, 15% and 32% of the examples. The labels are fixed, unlike the recent work on disturbing labels as a regularization method [53].…”
Section: Methodsmentioning
confidence: 99%
“…For the cleanliness test, we replaced the labes to a random incorrect one for 5%, 10%, 15% and 32% of the examples. The labels are fixed, unlike the recent work on disturbing labels as a regularization method [53].…”
Section: Methodsmentioning
confidence: 99%
“…In Figure 1 There have been numerous studies to solve either of two issues individually. On the one hand, to reduce the risk of overfitting in deep CNNs, previous research suggests adding appropriate randomness into the training phase [39,42,45]. For example, Dropout [39] adds randomness in activation by randomly discarding the hidden layers' outputs.…”
Section: Introductionmentioning
confidence: 99%
“…For these datasets, we train six networks: (a). a light ConvNet with the same architecture as in [42], (b). the network-innetwork (NIN) [43], (c).…”
Section: Cifar-10/100 and Svhnmentioning
confidence: 99%
“…It can be seen that our method outperforms Distur-bLabel[42] and L-Softmax[15] under the same architectures. Again, EM-softmax[16] achieves a lower error rate 26.86% than ours 25.91% using model ensembling, while we only measure single model performance.…”
mentioning
confidence: 95%