On the Minimax Optimality of the EM Algorithm for Learning Two-Component Mixed Linear Regression

Kwon, Jeongyeol; Ho, Nhat; Caramanis, Constantine

doi:10.48550/arxiv.2006.02601

Cited by 5 publications

(7 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…When the curvature around the local optimum degenerates, first-order methods such as gradient descent slow down due to vanishing gradients as the estimator gets closer to the local optimum. This phenomenon is reported in various optimization problems with degenerate landscapes in weakly separated mixture of distributions [Dwivedi et al, 2020a, Kwon et al, 2020. We can observe the same phenomenon when the rank is over-specified for low-rank matrix factorization problems.…”

Section: Related Worksupporting

confidence: 78%

“…We still need to show that equation ( 5) holds (with probability at least 1 − d −c ) in this sub-linear convergence case for all iteration t with high probability. To do so, we need to use the localization technique [Kwon et al, 2020, Dwivedi et al, 2020b. Without the localization technique, the statistical error will be proportional to n −1/4 which is not tight.…”

Section: Proof Sketch For the Main Theoremmentioning

confidence: 99%

“…Hence, we adapt the standard localization techniques from empirical process theory to sharpen the rates. Note that, these techniques had also been used to study the convergence rates of optimization algorithms in mixture models settings [Dwivedi et al, 2020a, Kwon et al, 2020.…”

Section: B2 Proof Of Theoremmentioning

confidence: 99%

See 2 more Smart Citations

On the computational and statistical complexity of over-parameterized matrix sensing

Zhuo¹,

Kwon²,

Ho³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

We consider solving the low rank matrix sensing problem with Factorized Gradient Descend (FGD) method when the true rank is unknown and over-specified, which we refer to as over-parameterized matrix sensing. If the ground truth signal X * ∈ R d * d is of rank r, but we try to recover it using FF where F ∈ R d * k and k > r, the existing statistical analysis falls short, due to a flat local curvature of the loss function around the global maxima. By decomposing the factorized matrix F into separate column spaces to capture the effect of extra ranks, we show that F t F t − X * 2 F converges to a statistical error of Õ kdσ 2 /n after Õ( σr σ n d ) number of iterations where F t is the output of FGD after t iterations, σ 2 is the variance of the observation noise, σ r is the r-th largest eigenvalue of X * , and n is the number of sample. Our results, therefore, offer a comprehensive picture of the statistical and computational complexity of FGD for the over-parameterized matrix sensing problem.

show abstract

Section: Related Worksupporting

confidence: 78%

Section: Proof Sketch For the Main Theoremmentioning

confidence: 99%

See 1 more Smart Citation

On the computational and statistical complexity of over-parameterized matrix sensing

Zhuo¹,

Kwon²,

Ho³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…Recently, a seminal work [6] obtains a general framework to study the statistical guarantees of EM algorithms in the classic lowdimensional setting. In subsequent works, the convergence rates of EM algorithm under various hidden variable models are studied, including the Gaussian mixture [58,13,59,39] and mixture of linear regression [61,41,40]. Another important direction is to design variants of EM algorithms in the high-dimensional regime.…”

Section: Introductionmentioning

confidence: 99%

High-Dimensional Differentially-Private EM Algorithm: Methods and Near-Optimal Statistical Guarantees

Zhang,

Zhang

2021

Preprint

View full text Add to dashboard Cite

In this paper, we develop a general framework to design differentially private expectationmaximization (EM) algorithms in high-dimensional latent variable models, based on the noisy iterative hard-thresholding. We derive the statistical guarantees of the proposed framework and apply it to three specific models: Gaussian mixture, mixture of regression, and regression with missing covariates. In each model, we establish the near-optimal rate of convergence with differential privacy constraints, and show the proposed algorithm is minimax rate optimal up to logarithm factors. The technical tools developed for the high-dimensional setting are then extended to the classic low-dimensional latent variable models, and we propose a near rate-optimal EM algorithm with differential privacy guarantees in this setting. Simulation studies and real data analysis are conducted to support our results.

show abstract

“…When the separation gets small (smaller δ or H), the convergence speed gets slower. This type of transition in the convergence speed of EM (the update of model parameters with Algorithm 3) is observed both in theory and practice when the overlap between mixture components gets larger (e.g.,[28]). On the other hand, the policy steadily improves regardless of the level of separation.…”

mentioning

confidence: 99%

RL for Latent MDPs: Regret Guarantees and a Lower Bound

Kwon,

Efroni,

Caramanis

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

In this work, we consider the regret minimization problem for reinforcement learning in latent Markov Decision Processes (LMDP). In an LMDP, an MDP is randomly drawn from a set of M possible MDPs at the beginning of the interaction, but the identity of the chosen MDP is not revealed to the agent. We first show that a general instance of LMDPs requires at least Ω((SA) M ) episodes to even approximate the optimal policy. Then, we consider sufficient assumptions under which learning good policies requires polynomial number of episodes. We show that the key link is a notion of separation between the MDP system dynamics. With sufficient separation, we provide an efficient algorithm with local guarantee, i.e., providing a sublinear regret guarantee when we are given a good initialization. Finally, if we are given standard statistical sufficiency assumptions common in the Predictive State Representation (PSR) literature (e.g., [6]) and a reachability assumption, we show that the need for initialization can be removed.

show abstract

On the Minimax Optimality of the EM Algorithm for Learning Two-Component Mixed Linear Regression

Cited by 5 publications

References 0 publications

On the computational and statistical complexity of over-parameterized matrix sensing

On the computational and statistical complexity of over-parameterized matrix sensing

High-Dimensional Differentially-Private EM Algorithm: Methods and Near-Optimal Statistical Guarantees

RL for Latent MDPs: Regret Guarantees and a Lower Bound

Contact Info

Product

Resources

About