Resolving learning rates adaptively by locating stochastic non-negative associated gradient projection points using line searches

Kafka, Dominic; Wilke, Daniel N.

doi:10.1007/s10898-020-00921-z

Cited by 8 publications

(22 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The optimizer is also one of the most important parameters in transfer learning. In this paper, stochastic gradient descent (SGD) [ 37 ] was used as the optimization algorithm. It updates only once per epoch without redundancy and is fast.…”

Section: Methodsmentioning

confidence: 99%

AF-SENet: Classification of Cancer in Cervical Tissue Pathological Images Based on Fusing Deep Convolution Features

Huang

Tan

Chen

et al. 2020

Sensors

View full text Add to dashboard Cite

Cervical cancer is the fourth most common cancer in the world. Whole-slide images (WSIs) are an important standard for the diagnosis of cervical cancer. Missed diagnoses and misdiagnoses often occur due to the high similarity in pathological cervical images, the large number of readings, the long reading time, and the insufficient experience levels of pathologists. Existing models have insufficient feature extraction and representation capabilities, and they suffer from insufficient pathological classification. Therefore, this work first designs an image processing algorithm for data augmentation. Second, the deep convolutional features are extracted by fine-tuning pre-trained deep network models, including ResNet50 v2, DenseNet121, Inception v3, VGGNet19, and Inception-ResNet, and then local binary patterns and a histogram of the oriented gradient to extract traditional image features are used. Third, the features extracted by the fine-tuned models are serially fused according to the feature representation ability parameters and the accuracy of multiple experiments proposed in this paper, and spectral embedding is used for dimension reduction. Finally, the fused features are inputted into the Analysis of Variance-F value-Spectral Embedding Net (AF-SENet) for classification. There are four different pathological images of the dataset: normal, low-grade squamous intraepithelial lesion (LSIL), high-grade squamous intraepithelial lesion (HSIL), and cancer. The dataset is divided into a training set (90%) and a test set (10%). The serial fusion effect of the deep features extracted by Resnet50v2 and DenseNet121 (C5) is the best, with average classification accuracy reaching 95.33%, which is 1.07% higher than ResNet50 v2 and 1.05% higher than DenseNet121. The recognition ability is significantly improved, especially in LSIL, reaching 90.89%, which is 2.88% higher than ResNet50 v2 and 2.1% higher than DenseNet121. Thus, this method significantly improves the accuracy and generalization ability of pathological cervical WSI recognition by fusing deep features.

show abstract

Section: Methodsmentioning

confidence: 99%

AF-SENet: Classification of Cancer in Cervical Tissue Pathological Images Based on Fusing Deep Convolution Features

Huang

Tan

Chen

et al. 2020

Sensors

View full text Add to dashboard Cite

show abstract

“…Once the directional derivative signs at α L and α U are found to brackets an SNN-GPP, we reduce the interval by applying the Regula-Falsi method [12]. This is essentially a consecutive linear interpolation method, until α * n satisfies Equation (15). We provide the pseudocode for the bracketing strategy in Algorithm 2.…”

Section: Bracketing Strategymentioning

confidence: 99%

“…For line searches, the sampling errors manifest mainly in the form of bias or variance along a descent direction, depending on whether mini-batches are sub-sampled statically or dynamically [8,15]. Static MBSS sub-samples a new mini-batch for every descent direction, while dynamic MBSS sub-samples a new mini-batch for every function evaluation.…”

Section: Introductionmentioning

confidence: 99%

GOALS: Gradient-Only Approximations for Line Searches Towards Robust and Consistent Training of Deep Neural Networks

Chae,

Wilke,

Kafka

2021

Preprint

Self Cite

View full text Add to dashboard Cite

Mini-batch sub-sampling (MBSS) is favored in deep neural network training to reduce the computational cost. Still, it introduces an inherent sampling error, making the selection of appropriate learning rates challenging. The sampling errors can manifest either as a bias or variances in a line search. Dynamic MBSS re-samples a mini-batch at every function evaluation. Hence, dynamic MBSS results in point-wise discontinuous loss functions with smaller bias but larger variance than static sampled loss functions. However, dynamic MBSS has the advantage of having larger data throughput during training but requires the complexity regarding discontinuities to be resolved. This study extends the gradient-only surrogate (GOS), a line search method using quadratic approximation models built with only directional derivative information, for dynamic MBSS loss functions. We propose a gradient-only approximation line search (GOALS) with strong convergence characteristics with defined optimality criterion. We investigate GOALS's performance by applying it on various optimizers that include SGD, RMSPROP and ADAM on ResNet-18 and Efficient-NetB0. We also compare GOALS's against the other existing learning rate methods. We quantify both the best performing and most robust algorithms. For the latter, we introduce a relative robust criterion that allows us to quantify the difference between an algorithm and the best performing algorithm for a given problem. The results show that training a model with the recommended learning rate for a class of search directions helps to reduce the model errors in multimodal cases.

show abstract

“…The recent introduction of Gradient-Only Line Searches (GOLS) (Kafka and Wilke, 2019a) has enabled learning rates to be determined automatically in the discontinuous loss functions of neural networks training with dynamic mini-batch sub-sampling (MBSS). The discontinuous nature of the dynamic MBSS loss is a direct result of successively sampling different minibatches from the training data at every function evaluation, introducing a sampling error (Kafka and Wilke, 2019a). To determine step sizes, GOLS locates Stochastic Non-Negative Associated Gradient Projection Points (SNN-GPPs), manifesting as sign changes from negative to positive in the directional derivative along a descent direction.…”

Section: Introductionmentioning

confidence: 99%

“…Previous work has shown, that the Gradient-Only Line Search that is Inexact (GOLS-I) is capable of determining step sizes for training algorithms beyond stochastic gradient descent (SGD) (Robbins and Monro, 1951), such as Adagrad (Duchi et al, 2011), which incorporates approximate second order information (Kafka and Wilke, 2019a). GOLS-I has also been demonstrated to outperform probabilistic line searches (Mahsereci and Hennig, 2017), provided mini-batch sizes are not too small (< 50 for investigated problems) (Kafka and Wilke, 2019).…”

Section: Introductionmentioning

confidence: 99%

Investigating the interaction between gradient-only line searches and different activation functions

Kafka,

Wilke

2020

Preprint

View full text Add to dashboard Cite

Gradient-only line searches (GOLS) adaptively determine step sizes along search directions for discontinuous loss functions resulting from dynamic mini-batch sub-sampling in neural network training.Step sizes in GOLS are determined by localizing Stochastic Non-Negative Associated Gradient Projection Points (SNN-GPPs) along descent directions. These are identified by a sign change in the directional derivative from negative to positive along a descent direction. Activation functions are a significant component of neural network architectures as they introduce non-linearities essential for complex function approximations. The smoothness and continuity characteristics of the activation functions directly affect the gradient characteristics of the loss function to be optimized. Therefore, it is of interest to investigate the relationship between activation functions and different neural network architectures in the context of GOLS. We find that GOLS are robust for a range of activation functions, but sensitive to the Rectified Linear Unit (ReLU) activation function in standard feedforward architectures. The zeroderivative in ReLU's negative input domain can lead to the gradient-vector becoming sparse, which severely affects training. We show that implementing architectural features such as batch normalization and skip connections can alleviate these difficulties and benefit training with GOLS for all activation functions considered.

show abstract

Resolving learning rates adaptively by locating stochastic non-negative associated gradient projection points using line searches

Cited by 8 publications

References 34 publications

AF-SENet: Classification of Cancer in Cervical Tissue Pathological Images Based on Fusing Deep Convolution Features

AF-SENet: Classification of Cancer in Cervical Tissue Pathological Images Based on Fusing Deep Convolution Features

GOALS: Gradient-Only Approximations for Line Searches Towards Robust and Consistent Training of Deep Neural Networks

Investigating the interaction between gradient-only line searches and different activation functions

Contact Info

Product

Resources

About