Learning large-scale data sets with high dimensionality is a main concern in research areas including machine learning, visual recognition, information retrieval, to name a few. In many practical uses such as images, video, audio, and text processing, we have to face with high-dimension and large-sample data problems. The trace-ratio problem is a key problem for feature extraction and dimensionality reduction to circumvent the high dimensional space. However, it has been long believed that this problem has no closed-form solution, and one has to solve it by using some inner-outer iterative algorithms that are very time consuming. Therefore, efficient algorithms for high-dimension and large-sample trace-ratio problems are still lacking, especially for dense data problems. In this work, we present a closed-form solution for the trace-ratio problem, and propose two algorithms to solve it. Based on the formula and the randomized singular value decomposition, we first propose a randomized algorithm for solving high-dimension and large-sample dense traceratio problems. For high-dimension and large-sample sparse trace-ratio problems, we then propose an algorithm based on the closed-form solution and solving some consistent under-determined linear systems. Theoretical results are established to show the rationality and efficiency of the proposed methods. Numerical experiments are performed on some real-world data sets, which illustrate the superiority of the proposed algorithms over many state-of-the-art algorithms for high-dimension and large-sample dimensionality reduction problems.
KeywordsDimensionality reduction • Trace-ratio problem • High-dimension and large-sample data • Large-scale discriminant analysis • Randomized singular value decomposition (RSVD) • Inner-outer iterative algorithm Editors: Tim Verdonck, Bart Baesens, María Óskarsdóttir and Seppe vanden Broucke.