PCA is one of the most widely used dimension reduction techniques. A related easier problem is "subspace learning" or "subspace estimation". Given relatively clean data, both are easily solved via singular value decomposition (SVD). The problem of subspace learning or PCA in the presence of outliers is called robust subspace learning or robust PCA (RPCA). For long data sequences, if one tries to use a single lower dimensional subspace to represent the data, the required subspace dimension may end up being quite large. For such data, a better model is to assume that it lies in a low-dimensional subspace that can change over time, albeit gradually. The problem of tracking such data (and the subspaces) while being robust to outliers is called robust subspace tracking (RST). This article provides a magazine-style overview of the entire field of robust subspace learning and tracking. In particular solutions for three problems are discussed in detail: RPCA via sparse+lowrank matrix decomposition (S+LR), RST via S+LR, and "robust subspace recovery (RSR)". RSR assumes that an entire data vector is either an outlier or an inlier. The S+LR formulation instead assumes that outliers occur on only a few data vector indices and hence are well modeled as sparse corruptions.
In this work, we study the robust subspace tracking (RST) problem and obtain one of the first two provable guarantees for it. The goal of RST is to track sequentially arriving data vectors that lie in a slowly changing low-dimensional subspace, while being robust to corruption by additive sparse outliers. It can also be interpreted as a dynamic (time-varying) extension of robust PCA (RPCA), with the minor difference that RST also requires a short tracking delay. We develop a recursive projected compressive sensing algorithm that we call Nearly Optimal RST via ReProCS (ReProCS-NORST) because its tracking delay is nearly optimal. We prove that NORST solves both the RST and the dynamic RPCA problems under weakened standard RPCA assumptions, two simple extra assumptions (slow subspace change and most outlier magnitudes lower bounded), and a few minor assumptions.Our guarantee shows that NORST enjoys a near optimal tracking delay of O(r log n log(1/ )). Its required delay between subspace change times is the same, and its memory complexity is n times this value. Thus both these are also nearly optimal. Here n is the ambient space dimension, r is the subspaces' dimension, and is the tracking accuracy. NORST also has the best outlier tolerance compared with all previous RPCA or RST methods, both theoretically and empirically (including for real videos), without requiring any model on how the outlier support is generated. This is possible because of the extra assumptions it uses. * A shorter version of this manuscript [1] will be presented at ICML, 2018. Another small part, Corollary 5.18, will appear in [2]. arXiv:1712.06061v4 [cs.IT] 6 Jul 2018 Definition 1.1. An n × r basis matrix P is µ-incoherent if max i=1,2,..,n P (i) 2 2 ≤ µr/n. Here µ is called the coherence parameter. It quantifies the non-denseness of P .A simple way to ensure that X is not low-rank is by imposing upper bounds on max-outlier-frac-row and max-outlier-frac-col [8,9]. One way to ensure identifiability of the changing subspaces is to assume that they are piecewise constant:and to lower bound t j+1 − t j . Let t 0 = 1 and t J+1 = d. With this model, r L = rJ in general (except if subspace directions are repeated). The union of the column spans of all the P j 's is equal to the span of the left singular vectors of L. Thus, assuming that the P j 's are µ-incoherent implies their incoherence. We also assume that the subspace coefficients a t are mutually independent over time, have identical and diagonal covariance matrices denoted by Λ, and are element-wise bounded. Element-wise bounded-ness of a t 's, along with the statistical assumptions, is similar to incoherence of right singular vectors of L (right incoherence); see Remark 3.5. Because tracking requires an online algorithm that processes data vectors one at a time or in mini-batches, we need these statistical assumptions on the a t 's. For the same reason, we also need to re-define max-outlier-frac-row as the maximum fraction of nonzeroes in any row of any α-consecutive-column sub-matrix of X. ...
Dynamic robust PCA refers to the dynamic (timevarying) extension of robust PCA (RPCA). It assumes that the true (uncorrupted) data lies in a low-dimensional subspace that can change with time, albeit slowly. The goal is to track this changing subspace over time in the presence of sparse outliers. We develop and study a novel algorithm, that we call simple-ReProCS, based on the recently introduced Recursive Projected Compressive Sensing (ReProCS) framework. Our work provides the first guarantee for dynamic RPCA that holds under weakened versions of standard RPCA assumptions, slow subspace change and a lower bound assumption on most outlier magnitudes. Our result is significant because (i) it removes the strong assumptions needed by the two previous complete guarantees for ReProCS-based algorithms; (ii) it shows that it is possible to achieve significantly improved outlier tolerance, compared with all existing RPCA or dynamic RPCA solutions by exploiting the above two simple extra assumptions; and (iii) it proves that simple-ReProCS is online (after initialization), fast, and, has nearoptimal memory complexity.
We study the "Low Rank Phase Retrieval (LRPR)" problem defined as follows: recover an n × q matrix X * of rank r from a different and independent set of m phaseless (magnitude-only) linear projections of each of its columns. To be precise, we need to recover X * from y k := |A k x * k |, k = 1, 2, . . . , q when the measurement matrices A k are mutually independent. Here y k is an m length vector and denotes transpose. The question is when can we solve LRPR with m n? Our work introduces the first provably correct solution, Alternating Minimization for Low-Rank Phase Retrieval (AltMinLowRaP), for solving LRPR. We demonstrate its advantage over existing work via extensive simulation, and some partly real data, experiments. Our guarantee for AltMinLowRaP shows that it can solve LRPR to accuracy if mq ≥ Cnr 4 log(1/ ), the matrices A k contain i.i.d. standard Gaussian entries, the condition number of X * is bounded by a numerical constant, and its right singular vectors satisfy the incoherence (denseness) assumption from matrix completion literature. Its time complexity is only Cmqnr log 2 (1/ ). In the regime of small r, our sample complexity is much better than what standard PR methods need; and it is only about r 3 times worse than its order-optimal value of (n + q)r. Moreover, if we replace m by its lower bound for each approach, then the same can be said for the time complexity comparison with standard PR. We also briefly study the dynamic extension of LRPR.The LRPR problem occurs in phaseless dynamic imaging, e.g., Fourier ptychographic imaging of live biological specimens, where acquiring measurements is expensive. We should point out that LRPR is a very different problem than its A k = A version, or its A k = A and with-phase (linear) version, both of which have been extensively studied in the literature.for each k. If we assume κ is a constant, up to constant factors, (4) also implies (3). Thus, up to constant factors, requiring right incoherence is the same as requiring that the maximum energy of any signal x * k is within constant factors of the average.
Principal Components Analysis (PCA) is one of the most widely used dimension reduction techniques. Robust PCA (RPCA) refers to the problem of PCA when the data may be corrupted by outliers. Recent work by Candès, Wright, Li, and Ma defined RPCA as a problem of decomposing a given data matrix into the sum of a low-rank matrix (true data) and a sparse matrix (outliers). The column space of the low-rank matrix then gives the PCA solution. This simple definition has lead to a large amount of interesting new work on provably correct, fast, and practical solutions to RPCA. More recently, the dynamic (time-varying) version of the RPCA problem has been studied and a series of provably correct, fast, and memory efficient tracking solutions have been proposed. Dynamic RPCA (or robust subspace tracking) is the problem of tracking data lying in a (slowly) changing subspace, while being robust to sparse outliers. This article provides an exhaustive review of the last decade of literature on RPCA and its dynamic counterpart (robust subspace tracking), along with describing their theoretical guarantees, discussing the pros and cons of various approaches, and providing empirical comparisons of performance and speed.A brief overview of the (low-rank) matrix completion literature is also provided (the focus is on works not discussed in other recent reviews). This refers to the problem of completing a lowrank matrix when only a subset of its entries are observed. It can be interpreted as a simpler special case of RPCA in which the indices of the outlier corrupted entries are known.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.