Revisiting Small Batch Training for Deep Neural Networks

Masters, Dominic; Luschi, C.

doi:10.48550/arxiv.1804.07612

Cited by 149 publications

(198 citation statements)

References 19 publications

Supporting

Mentioning

182

Contrasting

Order By: Relevance

“…Moreover, largecohort training can introduce fundamental optimization and generalization issues. Our results are reminiscent of work on large-batch training in centralized settings, where larger batches can stagnate convergence improvements (Dean et al, 2012;You et al, 2017;Golmant et al, 2018;McCandlish et al, 2018;Yin et al, 2018), and even lead to generalization issues with deep neural networks (Shallue et al, 2019;Ma et al, 2018;Keskar et al, 2017;Hoffer et al, 2017;Masters and Luschi, 2018;Lin et al, 2019Lin et al, , 2020. While some of the challenges we identify with large-cohort training are parallel to issues that arise in large-batch centralized learning, others are unique to federated learning and have not been previously identified in the literature.…”

Section: Introductionmentioning

confidence: 58%

“…This property of diminishing returns has been explored both empirically (Dean et al, 2012;McCandlish et al, 2018;Golmant et al, 2018;Shallue et al, 2019) and theoretically (Ma et al, 2018;Yin et al, 2018). Beyond the issue of speedup saturation, numerous works have also observed a generalization gap when training deep neural networks with large batches (Keskar et al, 2017;Hoffer et al, 2017;You et al, 2017;Masters and Luschi, 2018;Lin et al, 2019Lin et al, , 2020. Our work differs from these areas by specifically exploring how the cohort size (the number of selected clients) affects federated optimization methods.…”

Section: Related Workmentioning

confidence: 99%

“…Large-batch centralized optimization methods have repeatedly been shown to converge to models with worse generalization ability than models found by small-batch methods (Keskar et al, 2017;Hoffer et al, 2017;You et al, 2017;Masters and Luschi, 2018;Lin et al, 2019Lin et al, , 2020. Given the parallels between batch size in centralized learning and cohort size in FL, this raises obvious questions about whether similar issues occur in FL.…”

Section: Generalization Failuresmentioning

confidence: 99%

See 2 more Smart Citations

On Large-Cohort Training for Federated Learning

Charles¹,

Garrett²,

Huo³

et al. 2021

Preprint

View full text Add to dashboard Cite

Federated learning methods typically learn a model by iteratively sampling updates from a population of clients. In this work, we explore how the number of clients sampled at each round (the cohort size) impacts the quality of the learned model and the training dynamics of federated learning algorithms. Our work poses three fundamental questions. First, what challenges arise when trying to scale federated learning to larger cohorts? Second, what parallels exist between cohort sizes in federated learning and batch sizes in centralized learning? Last, how can we design federated learning methods that effectively utilize larger cohort sizes? We give partial answers to these questions based on extensive empirical evaluation. Our work highlights a number of challenges stemming from the use of larger cohorts. While some of these (such as generalization issues and diminishing returns) are analogs of large-batch training challenges, others (including training failures and fairness concerns) are unique to federated learning.

show abstract

Section: Introductionmentioning

confidence: 58%

Section: Related Workmentioning

confidence: 99%

Section: Generalization Failuresmentioning

confidence: 99%

See 1 more Smart Citation

On Large-Cohort Training for Federated Learning

Charles¹,

Garrett²,

Huo³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…We now turn to the question of how the learning parameters of our networks are estimated. The learning procedure consists in seeking an optimal vector parameter θ minimizing the energy J 1 (or J k , k > 1 depending on the problem) using a mini-batch stochastic gradient descent algorithm with adaptive momentum [55,56,57] on a set of training data.…”

Section: Reaction Network Based On a 1d Multilayer Perceptronmentioning

confidence: 99%

Learning phase field mean curvature flows with neural networks

Bretin,

Denis,

Masnou

et al. 2021

Preprint

View full text Add to dashboard Cite

We introduce in this paper new, efficient numerical methods based on neural networks for the approximation of the mean curvature flow of either oriented or non-orientable surfaces. To learn the correct interface evolution law, our neural networks are trained on phase field representations of exact evolving interfaces. The structure of the networks draws inspiration from splitting schemes used for the discretization of the Allen-Cahn equation. But when the latter approximates the mean curvature motion of oriented interfaces only, the approach we propose extends very naturally to the non-orientable case. In addition, although trained on smooth flows only, our networks can handle singularities as well. Furthermore, they can be coupled easily with additional constraints which allows us to show various applications illustrating the flexibility and efficiency of our approach: mean curvature flows with volume constraint, multiphase mean curvature flows, numerical approximation of Steiner trees, numerical approximation of minimal surfaces.

show abstract

“…Different from [9], we don't use synchronized batch normalization (BN) but standard BN in our experiment, and we find that synchronized BN brings a little damage to the performance. The possible reason is that large batch size when synchronizing BN will result in local optimal solution [44], especially for the FAS task in our experiments.…”

Section: B Implementation Detailsmentioning

confidence: 99%

Consistency Regularization for Deep Face Anti-Spoofing

Wang¹,

Yu²,

Wang³

et al. 2021

Preprint

View full text Add to dashboard Cite

Face anti-spoofing (FAS) plays a crucial role in securing face recognition systems. Empirically, given an image, a model with more consistent output on different views of this image usually performs better, as shown in Fig. 1. Motivated by this exciting observation, we conjecture that encouraging feature consistency of different views may be a promising way to boost FAS models.In this paper, we explore this way thoroughly by enhancing both Embedding-level and Prediction-level Consistency Regularization (EPCR) in FAS. Specifically, at the embedding-level, we design a dense similarity loss to maximize the similarities between all positions of two intermediate feature maps in a self-supervised fashion; while at the prediction-level, we optimize the mean square error between the predictions of two views. Notably, our EPCR is free of annotations and can directly integrate into semi-supervised learning schemes. Considering different application scenarios, we further design five diverse semisupervised protocols to measure semi-supervised FAS techniques. We conduct extensive experiments to show that EPCR can significantly improve the performance of several supervised and semi-supervised tasks on benchmark datasets. The codes and protocols will be released at https://github.com/clks-wzz/EPCR.

show abstract

Revisiting Small Batch Training for Deep Neural Networks

Cited by 149 publications

References 19 publications

On Large-Cohort Training for Federated Learning

On Large-Cohort Training for Federated Learning

Learning phase field mean curvature flows with neural networks

Consistency Regularization for Deep Face Anti-Spoofing

Contact Info

Product

Resources

About