Early Stopping without a Validation Set

Mahsereci, Maren; Balles, Lukas; Lassner, Christoph; Hennig, Philipp

doi:10.48550/arxiv.1703.09580

Cited by 12 publications

(25 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The central idea behind early stopping (stop training or optimization) [23] is that there exists a critical regime during the training of a learning model where the model ceases to generalize (perform better) on unseen data points while being able to do improve performance on given training data. Identifying this point of negative or zero return is also attractive from a computational perspective and is the goal of various early stopping rules or methods in machine learning [24], [25], [26]. A conventional and widely popular early stopping method in machine learning is the one based on validation data, which we name as Validation-based method.…”

Section: Early Stopping Methodsmentioning

confidence: 99%

“…A conventional and widely popular early stopping method in machine learning is the one based on validation data, which we name as Validation-based method. Although very effective in practice, especially with large training datasets where holding off a small part of the training data has no effect in the learning process, there are drawbacks to Validation-based early stopping [26]. The validation performance may have a large stochastic error depending on the size of the validation set and may introduce biases leading to poor generalization estimates.…”

Section: Early Stopping Methodsmentioning

confidence: 99%

“…This shortcoming of Validation-based early stopping has led to recent alternatives such as [27], [26]. First, [27] proposed a stopping rule based on estimating the marginal likelihood by tracking the change in entropy of the posterior distribution of the network parameters as an indicator of generalization.…”

Section: Early Stopping Methodsmentioning

confidence: 99%

“…However, the likelihood estimates in this framework are affected when the model has additional regularization terms, which is typically the case for most state-of-the-art methods. An alternative approach [26] presents an early stopping method based on a fast-to-compute local gradient statistic. This method obtains good results compared to the Validation-based method, but requires hyperparameter tuning.…”

Section: Early Stopping Methodsmentioning

confidence: 99%

“…This method obtains good results compared to the Validation-based method, but requires hyperparameter tuning. Moreover, both [27] and [26] rely on gradient-related statistics that are only valid in standard gradient descent or stochastic gradient descent settings and fail to generalize to more advanced optimizers, such as those based on momentum. Due to their limitations, these methods have been not widely used in practice.…”

Section: Early Stopping Methodsmentioning

confidence: 99%

See 4 more Smart Citations

Channel-Wise Early Stopping without a Validation Set via NNK Polytope Interpolation

Bonet¹,

Ortega²,

Ruiz-Hidalgo³

et al. 2021

Preprint

View full text Add to dashboard Cite

State-of-the-art neural network architectures continue to scale in size and deliver impressive generalization results, although this comes at the expense of limited interpretability. In particular, a key challenge is to determine when to stop training the model, as this has a significant impact on generalization. Convolutional neural networks (ConvNets) comprise highdimensional feature spaces formed by the aggregation of multiple channels, where analyzing intermediate data representations and the model's evolution can be challenging owing to the curse of dimensionality. We present channel-wise DeepNNK (CW-DeepNNK), a novel channel-wise generalization estimate based on non-negative kernel regression (NNK) graphs with which we perform local polytope interpolation on low-dimensional channels. This method leads to instance-based interpretability of both the learned data representations and the relationship between channels. Motivated by our observations, we use CW-DeepNNK to propose a novel early stopping criterion that (i) does not require a validation set, (ii) is based on a task performance metric, and (iii) allows stopping to be reached at different points for each channel. Our experiments demonstrate that our proposed method has advantages as compared to the standard criterion based on validation set performance.

show abstract

Section: Early Stopping Methodsmentioning

confidence: 99%

Section: Early Stopping Methodsmentioning

confidence: 99%

Section: Early Stopping Methodsmentioning

confidence: 99%

Section: Early Stopping Methodsmentioning

confidence: 99%

Section: Early Stopping Methodsmentioning

confidence: 99%

See 3 more Smart Citations

Channel-Wise Early Stopping without a Validation Set via NNK Polytope Interpolation

Bonet¹,

Ortega²,

Ruiz-Hidalgo³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

Exploiting All Samples in Low-Resource Sentence Classification: Early Stopping and Initialization Parameters

Choi

Lee

2023

IEEE Access

View full text Add to dashboard Cite

To improve deep-learning performance in low-resource settings, many researchers have redesigned model architectures or applied additional data (e.g., external resources, unlabeled samples). However, there have been relatively few discussions on how to make good use of small amounts of labeled samples, although it is potentially beneficial and should be done before applying additional data or redesigning models. In this study, we assume a low-resource setting in which only a few labeled samples (i.e., 30-100 per class) are available, and we discuss how to exploit them without additional data or model redesigns. We explore possible approaches in the following three aspects: training validation splitting, early stopping, and weight initialization. Extensive experiments are conducted on six public sentence classification datasets. Performance on various evaluation metrics (e.g., accuracy, loss, and calibration error) significantly varied depending on the approaches that were combined in the three aspects. Based on the results, we propose an integrated method, which is to initialize the model with a weight averaging method and use a nonvalidation stop method to train all samples. This simple integrated method consistently outperforms the competitive methods; e.g., the average accuracy of six datasets of this method was 1.8% higher than those of conventional validation-based methods. In addition, the integrated method further improves the performance when adapted to several state-of-the-art models that use additional data or redesign the network architecture (e.g., self-training and enhanced structural models). Our results highlight the importance of the training strategy and suggest that the integrated method can be the first step in the low-resource setting. This study provides empirical knowledge that will be helpful when dealing with low-resource data in future efforts. Our code is publicly available at https://github.com/DMCB-GIST/exploit_all_samples.

show abstract

Progress Estimation for End-to-End Training of Deep Learning Models With Online Data Preprocessing

Dong,

Luo

2024

IEEE Access

View full text Add to dashboard Cite

Deep learning is the best machine learning algorithm for numerous analytical tasks. On a large data set, training a deep learning model frequently lasts several days to several months. Throughout this long period, it would be helpful to show a progress indicator, which continually projects the percentage of model training work accomplished as well as the outstanding model training time. We formerly invented the first method to support this function while allowing early stopping. This method assumes that the input data to the model have been preprocessed before model training starts. This is a limitation. In practice, online data preprocessing is often integrated into the model and done as part of the end-to-end model training. Ignoring online data preprocessing costs can cause our former method to produce inaccurate estimates. To overcome this limitation, this paper presents a new progress estimation method that explicitly considers online data preprocessing. We did a coding implementation of our new method in TensorFlow. Our tests unveil that for various deep learning models that integrate online data preprocessing and in comparison with our former method, our proposed new method produces more stable progress estimates for model training and on average lowers the error of the predicted outstanding model training time by 16.0%.

show abstract

Early Stopping without a Validation Set

Cited by 12 publications

References 12 publications

Channel-Wise Early Stopping without a Validation Set via NNK Polytope Interpolation

Channel-Wise Early Stopping without a Validation Set via NNK Polytope Interpolation

Exploiting All Samples in Low-Resource Sentence Classification: Early Stopping and Initialization Parameters

Progress Estimation for End-to-End Training of Deep Learning Models With Online Data Preprocessing

Contact Info

Product

Resources

About