Pretext-based self-supervised learning aims to learn the semantic representation via a handcrafted pretext task over unlabeled data and then use the learned representation for downstream prediction tasks. Lee et al. (2020) prove that pretextbased self-supervised learning can effectively reduce the sample complexity of downstream tasks under Conditional Independence (CI) between the components of the pretext task conditional on the downstream label. However, the CI condition rarely holds in practice, and the downstream sample complexity will get much worse if the CI condition does not hold. In this paper, we explore the idea of applying a learnable function to the input to make the CI condition hold. In particular, we first rigorously formulate the criteria that the function needs to satisfy. We then design an ingenious loss function for learning such a function and prove that the function minimizing the proposed loss satisfies the above criteria. We theoretically study the number of labeled data required, and give a model-free lower bound showing that taking limited downstream data will hurt the performance of self-supervised learning. Furthermore, we take the model structure into account and give a model-dependent lower bound, which gets higher when the model capacity gets larger. Moreover, we conduct several numerical experiments to verify our theoretical results.
Studies on benign overfitting provide insights for the success of overparameterized deep learning models. In this work, we examine the benign overfitting phenomena in real-world settings. We found that for tasks such as training a ResNet model on ImageNet dataset, the model does not fit benignly. To understand why benign overfitting fails in the ImageNet experiment, we analyze previous benign overfitting models under a more restrictive setup where the number of parameters is not significantly larger than the number of data points. Under this mild overparameterization setup, our analysis identifies a phase change: unlike in the heavy overparameterization setting, benign overfitting can now fail in the presence of label noise. Our study explains our empirical observations, and naturally leads to a simple technique known as self-training that can boost the model's generalization performances. Furthermore, our work highlights the importance of understanding implicit bias in underfitting regimes as a future direction.
Benign overfitting demonstrates that overparameterized models can perform well on test data while fitting noisy training data. However, it only considers the final min-norm solution in linear regression, which ignores the algorithm information and the corresponding training procedure. In this paper, we generalize the idea of benign overfitting to the whole training trajectory instead of the min-norm solution and derive a time-variant bound based on the trajectory analysis. Starting from the timevariant bound, we further derive a time interval that suffices to guarantee a consistent generalization error for a given feature covariance. Unlike existing approaches, the newly proposed generalization bound is characterized by a time-variant effective dimension of feature covariance. By introducing the time factor, we relax the strict assumption on the feature covariance matrix required in previous benign overfitting under the regimes of overparameterized linear regression with gradient descent. This paper extends the scope of benign overfitting, and experiment results indicate that the proposed bound accords better with empirical evidence.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.