This paper presents a novel variational approach to impose statistical constraints to the output of both image generation (to perform typically texture synthesis) and image restoration (for instance to achieve denoising and super-resolution) methods. The empirical distributions of linear or non-linear descriptors are imposed to be close to some input distributions by minimizing a Wasserstein loss, i.e. the optimal transport distance between the distributions. We advocate the use of a Wasserstein distance because it is robust when using discrete distributions without the need to resort to kernel estimators. We showcase different estimators to tackle various image processing applications. These estimators include linear wavelet-based filtering to account for simple textures, non-linear sparse coding coefficients for more complicated patterns, and the image gradient to restore sharper contents. For applications to texture synthesis, the input distributions are the empirical distributions computed from an exemplar image. For image denoising and super-resolution, the estimation process is more difficult; we propose to make use of parametric models and we show results using Generalized Gaussian Distributions.