Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence 2017
DOI: 10.24963/ijcai.2017/468
|View full text |Cite
|
Sign up to set email alerts
|

Single-Pass PCA of Large High-Dimensional Data

Abstract: Principal component analysis (PCA) is a fundamental dimension reduction tool in statistics and machine learning. For large and high-dimensional data, computing the PCA (i.e., the top singular vectors of the data matrix) becomes a challenging task. In this work, a single-pass randomized algorithm is proposed to compute PCA with only one pass over the data. It is suitable for processing extremely large and high-dimensional data stored in slow memory (hard disk) or the data generated in a streaming fashion. Exper… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
31
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 30 publications
(31 citation statements)
references
References 16 publications
(52 reference statements)
0
31
0
Order By: Relevance
“…The randomized subspace iteration algorithm, which is a hybrid of Krylov and Rand methodologies, was developed based on randomized SVD [133,134]. In pass-efficient or one-pass randomized SVD, some tricks to reduce the number of passes have been considered [135,136]. TeraPCA, which is a software tool for use in population genetics studies, utilizes the Mailman algorithm to accelerate the expectationmaximization algorithms for PCA [137,138].…”
Section: Future Perspectivementioning
confidence: 99%
“…The randomized subspace iteration algorithm, which is a hybrid of Krylov and Rand methodologies, was developed based on randomized SVD [133,134]. In pass-efficient or one-pass randomized SVD, some tricks to reduce the number of passes have been considered [135,136]. TeraPCA, which is a software tool for use in population genetics studies, utilizes the Mailman algorithm to accelerate the expectationmaximization algorithms for PCA [137,138].…”
Section: Future Perspectivementioning
confidence: 99%
“…A simple adjustment to the extended 1-view method makes it considerably more accurate and it can outperform the previously mentioned 1-view method when the goal is to minimize the memory required by the randomized matrix sketches. Furthermore, based on 1-view and power iteration methods proposed by [58,59], we proposed a highly pass-efficient randomized block Krylov algorithm that is suited to approximating large matrices stored out-of-core in either row-major or column-major format.…”
Section: Discussionmentioning
confidence: 99%
“…The above 1-view procedure is advantageous for large matrices that are stored out-ofcore in row-major or column-major format. Of course, when the matrix is stored in column major format and accessed column wise, then the above procedure would be applied to the matrix's transpose [58]. However, this approach does not apply to general streaming matrices, nor the Jacobian matrices motivating this study.…”
Section: Looping Over All the Rows Inmentioning
confidence: 99%
See 1 more Smart Citation
“…It is even possible to find a matrix approximation with a single pass over the data [14]. This enables the efficient computation of a low-rank approximation of dense matrices that cannot be stored completely in fast memory [30].…”
mentioning
confidence: 99%