We study the multi-channel sparse blind deconvolution (MCS-BD) problem, whose task is to simultaneously recover a kernel a and multiple sparse inputs txiu p i"1 from their circulant convolution yi " a f xi (i " 1,¨¨¨, p). We formulate the task as a nonconvex optimization problem over the sphere. Under mild statistical assumptions of the data, we prove that the vanilla Riemannian gradient descent (RGD) method, with random initializations, provably recovers both the kernel a and the signals txiu p i"1 up to a signed shift ambiguity. In comparison with state-of-the-art results, our work shows significant improvements in terms of sample complexity and computational efficiency. Our theoretical results are corroborated by numerical experiments, which demonstrate superior performance of the proposed approach over the previous methods on both synthetic and real datasets. 1 Here, (i) BGpθq and BRpθq denote Bernoulli-Gaussian and Bernoulli-Rademacher distribution, respectively; (ii) θ P r0, 1s is the Bernoulli parameter controlling the sparsity level of x i ; (iii) ε denotes the recovery precision of global solution a‹, i.e., min ℓ }a´s ℓ ra‹s} ď ε; (iv) r O and r Ω hides logpnq, θ and other factors. 2 Recently, similar loss has been considered for short and sparse deconvolution [ZKW18] and complete dictionary learning [ZYL`19]. 3 As the tail of BGpθq distribution is heavier than that of BRpθq, their sample complexity would be even worse if BGpθq model was considered. 4 We say x obeys a Bernoulli-Rademacher distribution when x " b d r where d denotes point-wise product, b follows i.i.d. Bernoulli distribution and r follows i.i.d. Rademacher distribution.Contributions of this paper. In this work, we introduce an efficient optimization method for solving MCS-BD. We consider a natural nonconvex formulation based on a smooth relaxation of ℓ 1 -loss. Under mild assumptions of the data, we prove the following result.With random initializations, a vanilla RGD efficiently finds an approximate solution, which can be refined by a subgradient method that converges exactly to the target solution in a linear rate.We summarize our main result in Table 1. By comparison 5 with [LB18], our approach demonstrates substantial improvements for solving MCS-BD in terms of both sample and time complexity. Moreover, our experimental results imply that our analysis is still far from tight -the phase transitions suggest that p ě Ωppoly logpnqq samples might be sufficient for exact recovery, which is favorable for applications (as real data in form of images can have millions of pixels, resulting in huge dimension n). Our analysis is inspired by recent results on orthogonal dictionary learning [GBW18, BJS18], but much of our theoretical analysis is tailored for MCS-BD with a few extra new ingredients. Our work is the first result provably showing that vanilla gradient descent type methods with random initialization solve MCS-BD efficiently. Moreover, our ideas could potentially lead to new algorithmic guarantees for other nonconvex problems such a...