We study implicit regularization when optimizing an underdetermined quadratic objective over a matrix X with gradient descent on a factorization of X. We conjecture and provide empirical and theoretical evidence that with small enough step sizes and initialization close enough to the origin, gradient descent on a full dimensional factorization converges to the minimum nuclear norm solution.
Understanding generalization in deep learning is arguably one of the most important questions in deep learning. Deep learning has been successfully adopted to a large number of problems ranging from pattern recognition to complex decision making, but many recent researchers have raised many concerns about deep learning, among which the most important is generalization. Despite numerous attempts, conventional statistical learning approaches have yet been able to provide a satisfactory explanation on why deep learning works. A recent line of works aims to address the problem by trying to predict the generalization performance through complexity measures. In this competition, we invite the community to propose complexity measures that can accurately predict generalization of models. A robust and general complexity measure would potentially lead to a better understanding of deep learning's underlying mechanism and behavior of deep models on unseen data, or shed light on better generalization bounds. All these outcomes will be important for making deep learning more robust and reliable. * Lead organizer: Yiding Jiang; Scott Yak and Pierre Foret help implement large portion of the infrastructure and the remaining organizers' order is randomized.
Probabilistic matrix factorization (PMF) and other popular approaches to collaborative filtering assume that the ratings given by users for products are genuine, and hence they give equal importance to all available ratings. However, this is not always true due to several reasons including the presence of opinion spam in product reviews. In this paper, the possibility of performing collaborative filtering while attaching weights or quality scores to the ratings is explored. The quality scores, which are determined from the corresponding review data are used to "up-weight" or "down-weight" the importance given to the individual rating while performing collaborative filtering, thereby improving the accuracy of the predictions. First, the measure used to capture the quality of the ratings is described. Different approaches for estimating the quality score based on the available review information are examined. Subsequently, a mathematical formulation to incorporate quality scores as weights for the ratings in the basic PMF framework is derived. Experimental evaluation on two product categories of a benchmark data set from Amazon.com demonstrates the efficacy of our approach.
We examine gradient descent on unregularized logistic regression problems, with homogeneous linear predictors on linearly separable datasets. We show the predictor converges to the direction of the max-margin (hard margin SVM) solution. The result also generalizes to other monotone decreasing loss functions with an infimum at infinity, to multi-class problems, and to training a weight layer in a deep network in a certain restricted setting. Furthermore, we show this convergence is very slow, and only logarithmic in the convergence of the loss itself. This can help explain the benefit of continuing to optimize the logistic or cross-entropy loss even after the training error is zero and the training loss is extremely small, and, as we show, even if the validation loss increases. Our methodology can also aid in understanding implicit regularization in more complex models and with other optimization methods.
We show that gradient descent on full width linear convolutional networks of depth L converges to a linear predictor related to the 2/L bridge penalty in the frequency domain. This is in contrast to fully connected linear networks, where regardless of depth, gradient descent converges to the 2 maximum margin solution.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.