Abstract. This paper describes a novel theoretical characterization of the performance of non-local means (NLM) for noise removal. NLM has proven effective in a variety of empirical studies, but little is understood fundamentally about how it performs relative to classical methods based on wavelets or how its parameters should be chosen. For cartoon images and images which may contain thin features and regular textures, the error decay rates of NLM are derived and compared with those of linear filtering, oracle estimators, Yaroslavsky's filter and wavelet thresholding estimators. The trade-off between global and local search for matching patches is examined, and the bias reduction associated with the local polynomial regression version of NLM is analyzed. The theoretical results are validated via simulations for 2D images corrupted by additive white Gaussian noise.Key words. Non-local means (NLM), Yaroslavsky's filter, kernel smoothing, patch-based methods, local polynomial regression, oracle bounds, minimax bounds, cartoon model, textures.1. Introduction. The classical problem of image noise removal has drawn significant attention during the past few decades from the image processing, computational harmonic analysis, nonlinear approximation, and statistics communities. In recent years there has been a resurgence of interest in kernel-based methods, including the ubiquitous non-local means (NLM) algorithm [6], due to their practical efficacy on broad collections of "natural" images. While there is a wealth of theoretical analysis associated with nonlinear thresholding estimators based on wavelets and related sparse multiscale representations of images [13,14,35,51] or on diffusion models [52,47] and partial differential equations [39, 1], performance guarantees for NLM are lacking and this paper aims at providing some results in this direction.In this paper, we explore the theoretical underpinnings of adaptive kernel-based image estimation and derive bounds on the mean squared error as a function of the number of pixels observed and features of the underlying image. The denoising methods we consider are based on estimating each pixel value with a weighted sum of the surrounding pixels. Depending on how the weights in this average are selected, this corresponds to classical linear filters [38,60] [20,43,36] for more insights on a unifying framework for averaging filters.As none of these methods have been explicitly designed to deal with textured regions, many authors, inspired by work on texture synthesis [16] and inpainting [8], have proposed to introduce patches (small sub-images) to take advantage of natural image redundancy, es-