2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020
DOI: 10.1109/cvpr42600.2020.00815
|View full text |Cite
|
Sign up to set email alerts
|

Augment Your Batch: Improving Generalization Through Instance Repetition

Abstract: Large-batch SGD is important for scaling training of deep neural networks. However, without fine-tuning hyperparameter schedules, the generalization of the model may be hampered. We propose to use batch augmentation: replicating instances of samples within the same batch with different data augmentations. Batch augmentation acts as a regularizer and an accelerator, increasing both generalization and performance scaling for a fixed budget of optimization steps. We analyze the effect of batch augmentation on gra… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
102
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
5
5

Relationship

0
10

Authors

Journals

citations
Cited by 190 publications
(114 citation statements)
references
References 13 publications
0
102
0
Order By: Relevance
“…Linear spatial reduction attention (LSRA) [33] is utilized in the first two stages to reduce the computation cost of self-attention for long sequence. [26] 0.1 Drop path [17] 0.1 0.1 0.15 0.3 Repeated augment [15] RandAugment [5] Mixup prob. [40] 0.8 Cutmix prob.…”
Section: Methodsmentioning
confidence: 99%
“…Linear spatial reduction attention (LSRA) [33] is utilized in the first two stages to reduce the computation cost of self-attention for long sequence. [26] 0.1 Drop path [17] 0.1 0.1 0.15 0.3 Repeated augment [15] RandAugment [5] Mixup prob. [40] 0.8 Cutmix prob.…”
Section: Methodsmentioning
confidence: 99%
“…To obtain better generalization and data-efficiency of the model, we perform data augmentation on both images and texts during the pre-training phase to construct more image-text pairs. We apply AutoAugment (Krizhevsky et al, 2012;Sato et al, 2015;Cubuk et al, 2019;Hoffer et al, 2020) for image augmentation, following the SOTA vision recognition methods (Touvron et al, 2021;Xie et al, 2020b). To ensure the augmented texts are semantically similar as the original one, for text augmentation, we rewrite the original text using back-translation (Xie et al, 2020a;Sennrich et al, 2016a).…”
Section: Image and Text Augmentationmentioning
confidence: 99%
“…We follow the training recipe and augmentations from [20,22] when training from scratch for Kinetics datasets. We adopt synchronized AdamW [58] and train for 200 epochs with 2 repeated augmentation [40] on 128 GPUs. The mini-batch size is 4 clips per GPU.…”
Section: B4 Details: Kinetics Action Classificationmentioning
confidence: 99%