Abstract. Traditional sparse representation algorithms usually operate in a single Euclidean space. This paper leverages a self-explanatory reformulation of sparse representation, i.e., linking the learned dictionary atoms with the original feature spaces explicitly, to extend simultaneous dictionary learning and sparse coding into reproducing kernel Hilbert spaces (RKHS). The resulting single-view self-explanatory sparse representation (SSSR) is applicable to an arbitrary kernel space and has the nice property that the derivatives with respect to parameters of the coding are independent of the chosen kernel. With SSSR, multiple-view self-explanatory sparse representation (MSSR) is proposed to capture and combine various salient regions and structures from different kernel spaces. This is equivalent to learning a nonlinear structured dictionary, whose complexity is reduced by learning a set of smaller dictionary blocks via SSSR. SSSR and MSSR are then incorporated into a spatial pyramid matching framework and developed for image classification. Extensive experimental results on four benchmark datasets, including UIUC-Sports, Scene 15, Caltech-101, and Caltech-256, demonstrate the effectiveness of our proposed algorithm.