2019
DOI: 10.48550/arxiv.1909.06893
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Empirical study towards understanding line search approximations for training neural networks

Abstract: Choosing appropriate step sizes is critical for reducing the computational cost of training large-scale neural network models. Mini-batch sub-sampling (MBSS) is often employed for computational tractability. However, MBSS introduces a sampling error, that can manifest as a bias or variance in a line search. This is because MBSS can be performed statically, where the mini-batch is updated only when the search direction changes, or dynamically, where the mini-batch is updated everytime the function is evaluated.… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2020
2020
2020
2020

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 33 publications
0
1
0
Order By: Relevance
“…GOLS-I has also been demonstrated to outperform probabilistic line searches (Mahsereci and Hennig, 2017), provided mini-batch sizes are not too small (< 50 for investigated problems) (Kafka and Wilke, 2019). The gradient-only optimization paradigm has recently also shown promise in the construction of approximation models to conduct line searches (Chae and Wilke, 2019). Some of the most important factors governing the nature of the computed gradients are: 1) The neural network architecture, 2) the activation functions (AFs) used within the architecture, 3) the loss function implemented, and 4) the mini-batch size used to evaluate the loss, to name a few.…”
Section: Introductionmentioning
confidence: 99%
“…GOLS-I has also been demonstrated to outperform probabilistic line searches (Mahsereci and Hennig, 2017), provided mini-batch sizes are not too small (< 50 for investigated problems) (Kafka and Wilke, 2019). The gradient-only optimization paradigm has recently also shown promise in the construction of approximation models to conduct line searches (Chae and Wilke, 2019). Some of the most important factors governing the nature of the computed gradients are: 1) The neural network architecture, 2) the activation functions (AFs) used within the architecture, 3) the loss function implemented, and 4) the mini-batch size used to evaluate the loss, to name a few.…”
Section: Introductionmentioning
confidence: 99%