Time-varying stochastic optimization problems frequently arise in machine learning practice (e.g. gradual domain shift, object tracking, strategic classification). Often, the underlying process that drives the distribution shift is continuous in nature. We exploit this underlying continuity by developing predictor-corrector algorithms for time-varying stochastic optimization that anticipates changes in the underlying data generating process. We provide error bounds for the iterates, both in presence of pure and noisy access to the queries from the relevant derivatives of the loss function. Furthermore, we show (theoretically and empirically in several examples) that our method outperforms non-predictor corrector methods that do not anticipate changes in the data generating process. 1
This paper presents a number of new findings about the canonical change point estimation problem. The first part studies the estimation of a change point on the real line in a simple stump model using the robust Huber estimating function which interpolates between the 1 (absolute deviation) and 2 (least squares) based criteria. While the 2 criterion has been studied extensively, its robust counterparts and in particular, the 1 minimization problem have not. We derive the limit distribution of the estimated change point under the Huber estimating function and compare it to that under the 2 criterion. Theoretical and empirical studies indicate that it is more profitable to use the Huber estimating function (and in particular, the 1 criterion) under heavy tailed errors as it leads to smaller asymptotic confidence intervals at the usual levels compared to the 2 criterion. We also compare the 1 and 2 approaches in a parallel setting, where one has m independent single change point problems and the goal is to control the maximal deviation of the estimated change points from the true values, and establish rigorously that the 1 estimation criterion provides a superior rate of convergence to the 2 , and that this relative advantage is driven by the heaviness of the tail of the error distribution. Finally, we derive minimax optimal rates for the change plane estimation problem in growing dimensions and demonstrate that Huber estimation attains the optimal rate while the 2 scheme produces a rate sub-optimal estimator for heavy tailed errors. In the process of deriving our results, we establish a number of properties about the minimizers of compound Binomial and compound Poisson processes which are of independent interest.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.