The incorporation of a matrix relations, which encompass multidimensional similarities between local neighborhoods of data points in the underlying manifold of a data, improves the utilization of kernel based data analysis methodologies. However, the utilization of multidimensional similarities results in a larger kernel and hence the computational complexity of the corresponding spectral decomposition increases dramatically. In this paper, we propose an efficient approximation to a spectral decomposition of a multidimensional similarity based kernel. Furthermore, we propose a dictionary construction that approximates the oversized kernel in this case and its associated embedding. The performance of the proposed dictionary construction is demonstrated on an example of a super-kernel that utilizes the Diffusion Maps methodology together with linear-projection operators between tangent spaces in the manifold.
I. INTRODUCTIONRecent methods for advanced massive high dimensional data analysis utilize a manifold structure on which data points are assumed to lie. This manifold is immersed (or submersed) in an ambient space that is defined by observable parameters. Kernel methods such as k-PCA and Diffusion Maps (DM) [4] have provided good results in analyzing such massive high dimensional data. The defined kernel can be thought of as an adjacency matrix of a graph whose vertices are the data points in the dataset. The analysis of the eigenvalues and the corresponding eigenvectors of this matrix reveals many properties and connections in the graph. These methods are based on the spectral decomposition of a kernel that was designed to incorporate a scalar similarity measure between data points. The resulting embedding of the data points into an Euclidean space preserves the qualities represented by the designed kernel. This approach extends the core of the classical Multi-Dimensional Scaling (MDS) method [6], [9] by considering non-linear relations instead of just a linear one in its original Gram matrix.Recently, DM was extended in several different ways to handle the orientation in local tangent spaces [10]- [13]. The relation between two patches is described by a matrix instead of a scalar value. The resulting kernel captures enriched similarities between local structures in the underlying manifold. These enriched similarities can be used to analyze local areas around data points instead of analyzing their specific locations. For example, this analysis can be beneficial in image processing (analyzing regions instead of individual pixels) and when the data points are perturbed so that their surrounding area is more important than their specific position. Since the constructions of these similarities are based on local tangent spaces, they provide methods to manipulate tangential vector fields (e.g., perform out-ofsample extensions). These manipulations are beneficial when the analyzed data consists of directional information in addition to positional information on the manifold. For example, the goal in [2] is ...