We study the sample complexity of random Fourier features for online kernel learning-that is, the number of random Fourier features required to achieve good generalization performance. We show that when the loss function is strongly convex and smooth, online kernel learning with random Fourier features can achieve an O(log T /T ) bound for the excess risk with only O(1/λ 2 ) random Fourier features, where T is the number of training examples and λ is the modulus of strong convexity. This is a significant improvement compared to the existing result for batch kernel learning that requires O(T ) random Fourier features to achieve a generalization bound O(1/ √ T ). Our empirical study verifies that online kernel learning with a limited number of random Fourier features can achieve similar generalization performance as online learning using full kernel matrix. We also present an enhanced online learning algorithm with random Fourier features that improves the classification performance by multiple passes of training examples and a partial average.
ACM Reference Format:Ming Lin, Shifeng Weng, and Changshui Zhang. 2014. On the sample complexity of random Fourier features for online learning: How many random Fourier features do we need? ACM Trans.