In this paper, we propose a kernel principal component analysis model for multi-variate time series forecasting, where the training and prediction schemes are derived from the multi-view formulation of Restricted Kernel Machines. The training problem is simply an eigenvalue decomposition of the summation of two kernel matrices corresponding to the views of the input and output data. When a linear kernel is used for the output view, it is shown that the forecasting equation takes the form of kernel ridge regression. When that kernel is non-linear, a pre-image problem has to be solved to forecast a point in the input space. We evaluate the model on several standard time series datasets, perform ablation studies, benchmark with closely related models and discuss its results.
IntroductionKernel methods have seen great success in many applications with very-high dimensional data but low number of samples, and are therefore one of the most popular non-parametric models. In critical machine learning applications, kernel methods are preferred due to their strong theoretical foundation in learning theory [1,2,3,4]. Kernel methods map the data into a high-dimensional (possibly infinite) feature space by using the kernel trick. This kernel trick allows for natural, non-linear extensions to the traditional linear methods in terms of a dual representation using a suitable kernel function. This led to numerous popular methods such as kernel principal component analysis [5], kernel fisher discriminant analysis [6] and the least-squares support vector machine [1].However, when it comes to learning large-scale problems, kernel methods fall behind deep learning techniques due to their time and memory complexity. This also holds in the time series analysis and forecasting domain, which has recently been dominated by specialized deep neural network models [7,8,9,10].Attempts have been made to combine kernel and deep learning methods [11,12], especially for specific cases such as with deep gaussian processes [13] and multi-layer support vector machines [14]. Recently, a new unifying framework, named Restricted Kernel Machines (RKM), was proposed [15] that attempts to bridge kernel methods with deep learning. The Lagrangian function of the Least-Squares Support Vector Machine (LS-SVM) is similar to the energy function of Restricted Boltzmann Machine (RBMs), thereby drawing link between kernel methods and RBMs; hence the name Restricted Kernel Machines.
Contribution:In this work, we propose a novel kernel autoregressive time series forecasting model based on the RKM framework, where the training problem is the eigen-decomposition of two kernel matrices. Additionally, we use the same objective function to derive a novel prediction scheme to recursively forecast several steps ahead in the future.