Abstract. In the Bayesian approach to inverse problems, data are often informative, relative to the prior, only on a low-dimensional subspace of the parameter space. Significant computational savings can be achieved by using this subspace to characterize and approximate the posterior distribution of the parameters. We first investigate approximation of the posterior covariance matrix as a low-rank update of the prior covariance matrix. We prove optimality of a particular update, based on the leading eigendirections of the matrix pencil defined by the Hessian of the negative log-likelihood and the prior precision, for a broad class of loss functions. This class includes the Förstner metric for symmetric positive definite matrices, as well as the Kullback-Leibler divergence and the Hellinger distance between the associated distributions. We also propose two fast approximations of the posterior mean and prove their optimality with respect to a weighted Bayes risk under squared-error loss. These approximations are deployed in an offline-online manner, where a more costly but data-independent offline calculation is followed by fast online evaluations. As a result, these approximations are particularly useful when repeated posterior mean evaluations are required for multiple data sets. We demonstrate our theoretical results with several numerical examples, including high-dimensional X-ray tomography and an inverse heat conduction problem. In both of these examples, the intrinsic low-dimensional structure of the inference problem can be exploited while producing results that are essentially indistinguishable from solutions computed in the full space.Key words. inverse problems, Bayesian inference, low-rank approximation, covariance approximation, Förstner-Moonen metric, posterior mean approximation, Bayes risk, optimality 1. Introduction. In the Bayesian approach to inverse problems, the parameters of interest are treated as random variables, endowed with a prior probability distribution that encodes information available before any data are observed. Observations are modeled by their joint probability distribution conditioned on the parameters of interest, which defines the likelihood function and incorporates the forward model and a stochastic description of measurement or model errors. The prior and likelihood then combine to yield a probability distribution for the parameters conditioned on the observations, i.e., the posterior distribution. While this formulation is quite general, essential features of inverse problems bring additional structure to the Bayesian update. The prior distribution often encodes some kind of smoothness or correlation among the inversion parameters; observations typically are finite, few in number, and corrupted by noise; and the observations are indirect, related to the inversion parameters by the action of a forward operator that destroys some information. A key consequence of these features is that the data may be informative, relative to the prior, only on a low-dimensional subspace o...