In this work, we study the robust subspace tracking (RST) problem and obtain one of the first two provable guarantees for it. The goal of RST is to track sequentially arriving data vectors that lie in a slowly changing low-dimensional subspace, while being robust to corruption by additive sparse outliers. It can also be interpreted as a dynamic (time-varying) extension of robust PCA (RPCA), with the minor difference that RST also requires a short tracking delay. We develop a recursive projected compressive sensing algorithm that we call Nearly Optimal RST via ReProCS (ReProCS-NORST) because its tracking delay is nearly optimal. We prove that NORST solves both the RST and the dynamic RPCA problems under weakened standard RPCA assumptions, two simple extra assumptions (slow subspace change and most outlier magnitudes lower bounded), and a few minor assumptions.Our guarantee shows that NORST enjoys a near optimal tracking delay of O(r log n log(1/ )). Its required delay between subspace change times is the same, and its memory complexity is n times this value. Thus both these are also nearly optimal. Here n is the ambient space dimension, r is the subspaces' dimension, and is the tracking accuracy. NORST also has the best outlier tolerance compared with all previous RPCA or RST methods, both theoretically and empirically (including for real videos), without requiring any model on how the outlier support is generated. This is possible because of the extra assumptions it uses. * A shorter version of this manuscript [1] will be presented at ICML, 2018. Another small part, Corollary 5.18, will appear in [2]. arXiv:1712.06061v4 [cs.IT] 6 Jul 2018 Definition 1.1. An n × r basis matrix P is µ-incoherent if max i=1,2,..,n P (i) 2 2 ≤ µr/n. Here µ is called the coherence parameter. It quantifies the non-denseness of P .A simple way to ensure that X is not low-rank is by imposing upper bounds on max-outlier-frac-row and max-outlier-frac-col [8,9]. One way to ensure identifiability of the changing subspaces is to assume that they are piecewise constant:and to lower bound t j+1 − t j . Let t 0 = 1 and t J+1 = d. With this model, r L = rJ in general (except if subspace directions are repeated). The union of the column spans of all the P j 's is equal to the span of the left singular vectors of L. Thus, assuming that the P j 's are µ-incoherent implies their incoherence. We also assume that the subspace coefficients a t are mutually independent over time, have identical and diagonal covariance matrices denoted by Λ, and are element-wise bounded. Element-wise bounded-ness of a t 's, along with the statistical assumptions, is similar to incoherence of right singular vectors of L (right incoherence); see Remark 3.5. Because tracking requires an online algorithm that processes data vectors one at a time or in mini-batches, we need these statistical assumptions on the a t 's. For the same reason, we also need to re-define max-outlier-frac-row as the maximum fraction of nonzeroes in any row of any α-consecutive-column sub-matrix of X. ...