2008
DOI: 10.1007/978-3-540-92182-0_38
|View full text |Cite
|
Sign up to set email alerts
|

Deterministic Sparse Column Based Matrix Reconstruction via Greedy Approximation of SVD

Abstract: Abstract. Given a matrix A ∈ R m×n of rank r, and an integer k < r, the top k singular vectors provide the best rank-k approximation to A. When the columns of A have specific meaning, it is desirable to find (provably) "good" approximations to A k which use only a small number of columns in A. Proposed solutions to this problem have thus far focused on randomized algorithms. Our main result is a simple greedy deterministic algorithm with guarantees on the performance and the number of columns chosen. Specifica… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
16
0

Year Published

2011
2011
2022
2022

Publication Types

Select...
2
2
1

Relationship

0
5

Authors

Journals

citations
Cited by 6 publications
(16 citation statements)
references
References 25 publications
0
16
0
Order By: Relevance
“…For this and other reasons, a common task in genetics and other areas of data analysis is the following: given an input data matrix A and a parameter k, find the best subset of exactly k actual DNA SNPs or actual genes, i.e., actual columns or rows from A, to use to cluster individuals, reconstruct biochemical pathways, reconstruct signal, perform classification or inference, etc. Unfortunately, common formalizations of this algorithmic problem-including looking for the k actual columns that capture the largest amount of information or variance in the data or that are maximally uncorrelated-lead to intractable optimization problems [22,23]. For example, consider the so-called Column Subset Selection Problem [24]: given as input an arbitrary m × n matrix A and a rank parameter k, choose the set of exactly k columns of A s.t.…”
Section: Motivating Scientific Applicationsmentioning
confidence: 99%
See 3 more Smart Citations
“…For this and other reasons, a common task in genetics and other areas of data analysis is the following: given an input data matrix A and a parameter k, find the best subset of exactly k actual DNA SNPs or actual genes, i.e., actual columns or rows from A, to use to cluster individuals, reconstruct biochemical pathways, reconstruct signal, perform classification or inference, etc. Unfortunately, common formalizations of this algorithmic problem-including looking for the k actual columns that capture the largest amount of information or variance in the data or that are maximally uncorrelated-lead to intractable optimization problems [22,23]. For example, consider the so-called Column Subset Selection Problem [24]: given as input an arbitrary m × n matrix A and a rank parameter k, choose the set of exactly k columns of A s.t.…”
Section: Motivating Scientific Applicationsmentioning
confidence: 99%
“…That being said, running the risk of such a failure might be acceptable if one can efficiently couple to a diagnostic to check for such a failure, and if one can then correct for it by choosing more samples if necessary. The best numerical implementations of randomized matrix algorithms for low-rank matrix approximation do just this, and the strongest results in terms of minimizing p take advantage of Condition (22) in a somewhat different way than was originally used in the analysis of the CSSP [14]. For example, rather than choosing O(k log k) dimensions and then filtering them through exactly k dimensions, as the relative-error random sampling and relative-error random projection algorithms do, one can choose some number ℓ of dimensions and project onto a k ′ -dimensional subspace, where k < k ′ ≤ ℓ, while exploiting Condition (22) to bound the error, as appropriate for the computational environment at hand [14].…”
Section: An Improved Random Projection Algorithmmentioning
confidence: 99%
See 2 more Smart Citations
“…In this implementation, the MATLAB qr function is first used to calculate the QR decomposition with column pivoting and then the columns are swapped using the criterion specified by Gu and Eisenstat [31]. 8 -ApproxSVD: is the sparse approximation of Singular Value Decomposition (SVD) [9,10]. The algorithm was implemented in MATLAB.…”
Section: Evaluation Of Centralized Greedy Cssmentioning
confidence: 99%