Interpolation of spatial data is a very general mathematical problem with various applications. In geostatistics, it is assumed that the underlying structure of the data is a stochastic process which leads to an interpolation procedure known as kriging. This method is mathematically equivalent to kernel interpolation, a method used in numerical analysis for the same problem, but derived under completely different modelling assumptions. In this article we present the two approaches and discuss their modelling assumptions, notions of optimality and the different concepts to quantify the interpolation accuracy. Their relation is much closer than has been appreciated so far, and even results on convergence rates of kernel interpolants can be translated to the geostatistical framework. We sketch the different answers obtained in the two fields concerning the issue of kernel misspecification, present some methods for kernel selection, and discuss the scope of these methods with a data example from the computer experiments literature.