2020
DOI: 10.1007/s10898-020-00921-z
|View full text |Cite
|
Sign up to set email alerts
|

Resolving learning rates adaptively by locating stochastic non-negative associated gradient projection points using line searches

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
22
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
2
2

Relationship

1
7

Authors

Journals

citations
Cited by 8 publications
(22 citation statements)
references
References 34 publications
0
22
0
Order By: Relevance
“…The optimizer is also one of the most important parameters in transfer learning. In this paper, stochastic gradient descent (SGD) [ 37 ] was used as the optimization algorithm. It updates only once per epoch without redundancy and is fast.…”
Section: Methodsmentioning
confidence: 99%
“…The optimizer is also one of the most important parameters in transfer learning. In this paper, stochastic gradient descent (SGD) [ 37 ] was used as the optimization algorithm. It updates only once per epoch without redundancy and is fast.…”
Section: Methodsmentioning
confidence: 99%
“…Once the directional derivative signs at α L and α U are found to brackets an SNN-GPP, we reduce the interval by applying the Regula-Falsi method [12]. This is essentially a consecutive linear interpolation method, until α * n satisfies Equation (15). We provide the pseudocode for the bracketing strategy in Algorithm 2.…”
Section: Bracketing Strategymentioning
confidence: 99%
“…For line searches, the sampling errors manifest mainly in the form of bias or variance along a descent direction, depending on whether mini-batches are sub-sampled statically or dynamically [8,15]. Static MBSS sub-samples a new mini-batch for every descent direction, while dynamic MBSS sub-samples a new mini-batch for every function evaluation.…”
Section: Introductionmentioning
confidence: 99%
“…The recent introduction of Gradient-Only Line Searches (GOLS) (Kafka and Wilke, 2019a) has enabled learning rates to be determined automatically in the discontinuous loss functions of neural networks training with dynamic mini-batch sub-sampling (MBSS). The discontinuous nature of the dynamic MBSS loss is a direct result of successively sampling different minibatches from the training data at every function evaluation, introducing a sampling error (Kafka and Wilke, 2019a). To determine step sizes, GOLS locates Stochastic Non-Negative Associated Gradient Projection Points (SNN-GPPs), manifesting as sign changes from negative to positive in the directional derivative along a descent direction.…”
Section: Introductionmentioning
confidence: 99%
“…Previous work has shown, that the Gradient-Only Line Search that is Inexact (GOLS-I) is capable of determining step sizes for training algorithms beyond stochastic gradient descent (SGD) (Robbins and Monro, 1951), such as Adagrad (Duchi et al, 2011), which incorporates approximate second order information (Kafka and Wilke, 2019a). GOLS-I has also been demonstrated to outperform probabilistic line searches (Mahsereci and Hennig, 2017), provided mini-batch sizes are not too small (< 50 for investigated problems) (Kafka and Wilke, 2019).…”
Section: Introductionmentioning
confidence: 99%