Within the context of empirical risk minimization, see Raginsky, Rakhlin, and Telgarsky (2017), we are concerned with a non-asymptotic analysis of sampling algorithms used in optimization. In particular, we obtain non-asymptotic error bounds for a popular class of algorithms called Stochastic Gradient Langevin Dynamics (SGLD). These results are derived in appropriate Wasserstein distances in the absence of log-concavity of the target distribution. More precisely, the stochastic gradient H(θ, x) is assumed to be locally Lipschitz continuous in both variables, and furthermore, the dissipativity and convexity at infinity conditions are relaxed by removing their uniform dependence in x. This relaxation allows us to present two key paradigms, in minibatch logistic regression and in variational inference, to their full generality.
Adaptive importance samplers are adaptive Monte Carlo algorithms to estimate expectations with respect to some target distribution which adapt themselves to obtain better estimators over a sequence of iterations. Although it is straightforward to show that they have the same $$\mathcal {O}(1/\sqrt{N})$$ O ( 1 / N ) convergence rate as standard importance samplers, where N is the number of Monte Carlo samples, the behaviour of adaptive importance samplers over the number of iterations has been left relatively unexplored. In this work, we investigate an adaptation strategy based on convex optimisation which leads to a class of adaptive importance samplers termed optimised adaptive importance samplers (OAIS). These samplers rely on the iterative minimisation of the $$\chi ^2$$ χ 2 -divergence between an exponential family proposal and the target. The analysed algorithms are closely related to the class of adaptive importance samplers which minimise the variance of the weight function. We first prove non-asymptotic error bounds for the mean squared errors (MSEs) of these algorithms, which explicitly depend on the number of iterations and the number of samples together. The non-asymptotic bounds derived in this paper imply that when the target belongs to the exponential family, the $$L_2$$ L 2 errors of the optimised samplers converge to the optimal rate of $$\mathcal {O}(1/\sqrt{N})$$ O ( 1 / N ) and the rate of convergence in the number of iterations are explicitly provided. When the target does not belong to the exponential family, the rate of convergence is the same but the asymptotic $$L_2$$ L 2 error increases by a factor $$\sqrt{\rho ^\star } > 1$$ ρ ⋆ > 1 , where $$\rho ^\star - 1$$ ρ ⋆ - 1 is the minimum $$\chi ^2$$ χ 2 -divergence between the target and an exponential family proposal.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.