Kernel analog forecasting (KAF), alternatively known as kernel principal component regression, is a kernel method used for nonparametric statistical forecasting of dynamically generated time series data. This paper synthesizes descriptions of kernel methods and Koopman operator theory in order to provide a single consistent account of KAF. The framework presented here illuminates the property of the KAF method that, under measure-preserving and ergodic dynamics, it consistently approximates the conditional expectation of observables that are acted upon by the Koopman operator of the dynamical system and are conditioned on the observed data at forecast initialization. More precisely, KAF yields optimal predictions, in the sense of minimal root mean square error with respect to the invariant measure, in the asymptotic limit of large data. The presented framework facilitates, moreover, the analysis of generalization error and quantification of uncertainty. Extensions of KAF to the construction of conditional variance and conditional probability functions are also shown. Illustrations of various aspects of KAF are provided with applications to simple examples, namely a periodic flow on the circle and the chaotic Lorenz 63 system. (Dimitrios Giannakis) science applications, abstract statistical structures are the focus when situated in a more general machine learning context. Common nonparametric machine learning techniques include multilayer perceptrons [12], Bayesian neural networks [13], classification and regression trees (CART), and a variety of kernel methods [14]. Although each of these methods can provide value in unique ways to specific problems, kernel methods are particularly well suited to problems where there may be a natural, a priori, notion of similarity between data points. Since analog methods rely on the possibility that the relevance of any historical analog to present day conditions can be quantitatively determined, formal understanding of such methods can improve when they are cast within the larger framework of kernel methods.Kernel methods constitute a class of algorithms that perform classical calculations in a rich functional feature space in order to extract and predict nonlinear patterns. This central idea, commonly referred to as "the kernel trick", was first proposed in 1964 [15], was popularized with the invention of nonlinear support vector machines (SVMs) in 1992 [16], and has since spread to a variety of machine learning applica-