While Stochastic Gradient Descent (SGD) is a rather efficient algorithm for data-driven problems, it is an incomplete optimization algorithm as it lacks stopping criteria, which has limited its adoption in situations where such criteria are necessary. Unlike stopping criteria for deterministic methods, stopping criteria for SGD require a detailed understanding of (A) strong convergence, (B) whether the criteria will be triggered, (C) how false negatives are controlled, and (D) how false positives are controlled. In order to address these issues, we first prove strong global convergence (i.e., convergence with probability one) of SGD on a popular and general class of convex and nonconvex functions that are specified by, what we call, the Bottou-Curtis-Nocedal structure. Our proof of strong global convergence refines many techniques currently in the literature and employs new ones that are of independent interest. With strong convergence established, we then present several stopping criteria and rigorously explore whether they will be triggered in finite time and supply bounds on false negative probabilities. Ultimately, we lay a foundation for rigorously developing stopping criteria for SGD methods for a broad class of functions, in hopes of making SGD a more complete optimization algorithm with greater adoption for data-driven problems.
Abstract. Modern proximal and stochastic gradient descent (SGD) methods are believed to efficiently minimize large composite objective functions, but such methods have two algorithmic challenges: (1) a lack of fast or justified stop conditions, and (2) sensitivity to the objective function's conditioning. In response to the first challenge, modern proximal and SGD methods guarantee convergence only after multiple epochs, but such a guarantee renders proximal and SGD methods infeasible when the number of component functions is very large or infinite. In response to the second challenge, second order SGD methods have been developed, but they are marred by the complexity of their analysis. In this work, we address these challenges on the limited, but important, linear regression problem by introducing and analyzing a second order proximal/SGD method based on Kalman Filtering (kSGD). Through our analysis, we show kSGD is asymptotically optimal, develop a fast algorithm for very large, infinite or streaming data sources with a justified stop condition, prove that kSGD is insensitive to the problem's conditioning, and develop a unique approach for analyzing the complex second order dynamics. Our theoretical results are supported by numerical experiments on three regression problems (linear, nonparametric wavelet, and logistic) using three large publicly available datasets. Moreover, our analysis and experiments lay a foundation for embedding kSGD in multiple epoch algorithms, extending kSGD to other problem classes, and developing parallel and low memory kSGD implementations.
Randomized linear system solvers have become popular as they have the potential to reduce floating point complexity while still achieving desirable convergence rates. One particularly promising class of methods, random sketching solvers, has achieved the best known computational complexity bounds in theory, but is blunted by two practical considerations: there is no clear way of choosing the size of the sketching matrix apriori ; and there is a nontrivial storage cost of the projected system. In this work, we make progress towards addressing these issues by implicitly generating the sketched system and solving it simultaneously through an iterative procedure. As a result, we replace the question of the size of the sketching matrix with determining appropriate stopping criteria; we also avoid the costs of explicitly representing the sketched linear system; and our implicit representation also solves the system at the same time, which controls the per-iteration computational costs. Additionally, our approach allows us to generate a connection between random sketching methods and randomized iterative solvers (e.g., randomized Kaczmarz method). As a consequence, we exploit this connection to (1) produce a stronger, more precise convergence theory for such randomized iterative solvers under arbitrary sampling schemes (i.i.d., adaptive, permutation, dependent, etc.), and (2) improve the rates of convergence of randomized iterative solvers at the expense of a user-determined increases in per-iteration computational and storage costs. We demonstrate these concepts on numerical examples on forty-nine distinct linear systems.Gower and Richtrik, 2015]: the noniterative scheme is simply repeated in order to get better convergence properties. We are not doing this, but rather turning the noniterative scheme into an iterative one.2 We will be more precise about what we refer to as base methods. For now, such methods are exemplified by randomized Kaczmarz [Strohmer and Vershynin, 2009] and randomized Gauss-Seidel [Leventhal and Lewis, 2010].
Despite the promise of precision agriculture for increasing the productivity by implementing site-specific management, farmers remain skeptical and its utilization rate is lower than expected. A major cause is a lack of concrete approaches to higher profitability. When involving many variables in both controlled management and monitored environment, optimal site-specific management for such high-dimensional cropping systems is considerably more complex than the traditional low-dimensional cases widely studied in the existing literature, calling for a paradigm shift in optimization of site-specific management. We propose an algorithmic approach that enables farmers to efficiently learn their own site-specific management through on-farm experiments. We test its performance in two simulated scenarios-one of medium complexity with 150 management variables and one of high complexity with 864 management variables. Results show that, relative to uniform management, site-specific management learned from 5-year experiments generates $43/ha higher profits with 25 kg/ha less nitrogen fertilizer in the first scenario and $40/ha higher profits with 55 kg/ha less nitrogen fertilizer in the second scenario. Thus, complex site-specific management can be learned efficiently and be more profitable and environmentally sustainable than uniform management.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.