Nonconvex Low-Rank Tensor Completion from Noisy Data

Cai, Changxiao; Li, Gen; Poor, H. Vincent; Chen, Yuxin

doi:10.1287/opre.2021.2106

Cited by 47 publications

(72 citation statements)

References 60 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Proposition 1. [24, Propositions 4.6.1, 4.6.2] For a symmetric matrix M ∈ R N ×N , a vector x ∈ R N with x 2 = 1 is an eigenvector of M if and only if it is a critical point of the population risk (5) 2 . Moreover, denote {λ n } N n=1 with λ 1 > λ 2 ≥ λ 3 ≥ • • • ≥ λ N as the eigenvalues of M and {v n } N n=1 as the associated eigenvectors.…”

Section: Resultsmentioning

confidence: 99%

“…For any small constant δ ∈ (0, 1], it follows from [17, Lemma 13] that Y−EY ≤ δ x 2 2 with probability at least 1 − c 1 N −c2 provided that m ≥ Cδ −2 N log(N ) for some constant C. Therefore, by setting = Cδ x 2 2 ≤ η = 0.11d min , we can conclude that the two conditions in (7) and (8) hold with probability at least 1 − c 1 N −c2 as long as the number of measurements satisfies m ≥ Cd −2 min x 4 2 N log(N ). Moreover, it follows from Corollary III.1 that the distance between the empirical local minimum and population local minimum is on the order of d −1 min δ x 2 2 .…”

Section: Phase Retrievalmentioning

confidence: 99%

“…10 Note that we assume x, v 1 ≥ 0 here. In the case when x, v 1 < 0, one can bound x + v 1 2 2 instead.…”

Section: Appendix a Proof Of Theorem Iii1mentioning

confidence: 99%

“…S PECTRAL methods are of fundamental importance in signal processing and machine learning due to their simplicity and effectiveness. They have been widely used to extract useful information from noisy and partially observed data in a variety of applications, including dimensionality reduction [1], tensor estimation [2], ranking from pairwise comparisons [3], low-rank matrix estimation [4], and community detection [5], to name a few. As is well known, the theoretical performance of the gradient descent algorithm and its variants is heavily dependent on a proper initialization.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Landscape Correspondence of Empirical and Population Risks in the Eigendecomposition Problem

Li,

Tang,

Wakin

2021

Preprint

View full text Add to dashboard Cite

Spectral methods include a family of algorithms related to the eigenvectors of certain data-generated matrices. In this work, we are interested in studying the geometric landscape of the eigendecomposition problem in various spectral methods. In particular, we first extend known results regarding the landscape at critical points to larger regions near the critical points in a special case of finding the leading eigenvector of a symmetric matrix. For a more general eigendecomposition problem, inspired by recent findings on the connection between the landscapes of empirical risk and population risk, we then build a novel connection between the landscape of an eigendecomposition problem that uses random measurements and the one that uses the true data matrix. We also apply our theory to a variety of low-rank matrix optimization problems and conduct a series of simulations to illustrate our theoretical findings.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Phase Retrievalmentioning

confidence: 99%

“…10 Note that we assume x, v 1 ≥ 0 here. In the case when x, v 1 < 0, one can bound x + v 1 2 2 instead.…”

Section: Appendix a Proof Of Theorem Iii1mentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Landscape Correspondence of Empirical and Population Risks in the Eigendecomposition Problem

Li,

Tang,

Wakin

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…For the former, several papers use SI after training (Nakkiran et al, 2015;Yaguchi et al, 2019;Yang et al, 2020), while Ioannou et al (2016) argue for initializing factors as though they were single layers, which we find inferior to SI in some cases. Outside deep learning, spectral methods have also been shown to yield better initializations for certain matrix and tensor problems (Keshavan et al, 2010;Cai et al, 2019). For regularization, Gray et al (2019) suggest compression-rate scaling (CRS), which scales weight-decay using the reduction in parameter count; this is justified via the usual Bayesian understanding of 2 -regularization (Murphy, 2012).…”

Section: Related Workmentioning

confidence: 99%

Initialization and Regularization of Factorized Neural Layers

Khodak¹,

Tenenholtz²,

Mackey³

et al. 2021

Preprint

View full text Add to dashboard Cite

Factorized layers-operations parameterized by products of two or more matrices-occur in a variety of deep learning contexts, including compressed model training, certain types of knowledge distillation, and multi-head selfattention architectures. We study how to initialize and regularize deep nets containing such layers, examining two simple, understudied schemes, spectral initialization and Frobenius decay, for improving their performance. The guiding insight is to design optimization routines for these networks that are as close as possible to that of their well-tuned, non-decomposed counterparts; we back this intuition with an analysis of how the initialization and regularization schemes impact training with gradient descent, drawing on modern attempts to understand the interplay of weight-decay and batch-normalization. Empirically, we highlight the benefits of spectral initialization and Frobenius decay across a variety of settings. In model compression, we show that they enable low-rank methods to significantly outperform both unstructured sparsity and tensor methods on the task of training low-memory residual networks; analogs of the schemes also improve the performance of tensor decomposition techniques. For knowledge distillation, Frobenius decay enables a simple, overcomplete baseline that yields a compact model from over-parameterized training without requiring retraining with or pruning a teacher network. Finally, we show how both schemes applied to multi-head attention lead to improved performance on both translation and unsupervised pre-training.

show abstract

Bayesian robust tensor completion via CP decomposition

Wang

Yang

et al. 2022

Pattern Recognition Letters

View full text Add to dashboard Cite

Nonconvex Low-Rank Tensor Completion from Noisy Data

Cited by 47 publications

References 60 publications

Landscape Correspondence of Empirical and Population Risks in the Eigendecomposition Problem

Landscape Correspondence of Empirical and Population Risks in the Eigendecomposition Problem

Initialization and Regularization of Factorized Neural Layers

Bayesian robust tensor completion via CP decomposition

Contact Info

Product

Resources

About