Quadratic optimization with orthogonality constraint: explicit Łojasiewicz exponent and linear convergence of retraction-based line-search and stochastic variance-reduced gradient methods

Liu, Huikang; So, Anthony Man–Cho; Wu, Weijie

doi:10.1007/s10107-018-1285-1

Cited by 70 publications

(76 citation statements)

References 48 publications

(52 reference statements)

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…The first algorithm even admits a global convergence rate of O(1/ǫ), in the same order as the gradient descent algorithm, which is faster than the subgradient method. In addition, we demonstrate that the first algorithm also admits a local linear convergence rate, by a delicate analysis on the Kurdyka-Lojasiewicz (KL) [6,11,34,20] property for problem (M). We illustrate in our numerical experiments the efficiency of the proposed algorithms when compared with the state-of-the-art methods for GTRS in the literature.…”

Section: Introductionmentioning

confidence: 99%

Novel Reformulations and Efficient Algorithms for the Generalized Trust Region Subproblem

Jiang¹,

Li²

2019

SIAM J. Optim.

View full text Add to dashboard Cite

We present a new solution framework to solve the generalized trust region subproblem (GTRS) of minimizing a quadratic objective over a quadratic constraint. More specifically, we derive a convex quadratic reformulation (CQR) via minimizing a linear objective over two convex quadratic constraints for the GTRS. We show that an optimal solution of the GTRS can be recovered from an optimal solution of the CQR. We further prove that this CQR is equivalent to minimizing the maximum of the two convex quadratic functions derived from the CQR for the case under our investigation. Although the latter minimax problem is nonsmooth, it is well-structured and convex. We thus develop two steepest descent algorithms corresponding to two different line search rules. We prove for both algorithms their global sublinear convergence rates. We also obtain a local linear convergence rate of the first algorithm by estimating the Kurdyka-Lojasiewicz exponent at any optimal solution under mild conditions. We finally demonstrate the efficiency of our algorithms in our numerical experiments. Problem (P) is known as the generalized trust region subproblem (GTRS) [44,41]. When Q 2 is an identity matrix I and b 2 = 0, c = −1/2, problem (P) reduces to the classical trust region subproblem (TRS). The TRS first arose in the trust region method for nonlinear optimization [15,49], and has found many applications including robust optimization [8] and the least square problems [50]. As a generalization, the GTRS also admits its own applications such as time of arrival problems [26] and subproblems of consensus ADMM in signal processing [29]. Over the past two decades, numerous solution methods have been developed for TRS (see [38,36,48,42,25,22,4] and references therein).Various methods have been developed for solving the GTRS under various assumptions (see [37,44,10,45,16,41,5] and references therein). Although it appears being nonconvex, the GTRS essentially enjoys Assumption 2.1. The set I P SD := {λ : Q 1 + λQ 2 0} ∩ R + is not empty, where R + is the nonnegative orthant.Assumption 2.2. The common null space of Q 1 and Q 2 is trivial, i.e., Null(Q 1 ) ∩ Null(Q 2 ) = {0}.Before introducing our CQR, let us first recall the celebrated S-lemma by definingf 1 (x) = f 1 (x) + γ with an arbitrary constant γ ∈ R.Lemma 2.3 (S-lemma [47,40]). The following two statements are equivalent: 1. The system off 1 (x) < 0 and f 2 (x) ≤ 0 is not solvable; 2. There exists µ ≥ 0 such thatf 1 (x) + µf 2 (x) ≥ 0 for all x ∈ R n .Using the S-lemma, the following lemma shows a necessary and sufficient condition under which problem (P) is bounded from below.

show abstract

Section: Introductionmentioning

confidence: 99%

Novel Reformulations and Efficient Algorithms for the Generalized Trust Region Subproblem

Jiang¹,

Li²

2019

SIAM J. Optim.

View full text Add to dashboard Cite

show abstract

“…The basic idea of the line-search method for the optimization problem is to search the optimal solution in the tangent space of the Stiefel manifold. We observed that line-search method based on the polar decomposition-based retraction updates the representation of a vertex through linear summation of other representations in iterations [25]. In our problem, that means:…”

Section: Approximated Algorithm In Graph Streamsmentioning

confidence: 88%

“…The problem with such format has been widely studied and concluded with no closed-form solution. State-of-the-art solution is to learn the solution through Riemann gradient approach [24] or line-search method on the Stiefel manifold [25], whose convergence analysis has attracted extensive research attention very recently. However, they are not suitable for streaming setting, becaus waiting for convergence brings in time uncertainty and gradient-based methods possess unsatisfied time complexity.…”

Section: Dynamic Graph Representation Learningmentioning

confidence: 99%

Real-Time Streaming Graph Embedding Through Local Actions

Liu¹,

Hsieh²,

Duffield³

et al. 2019

Companion Proceedings of the 2019 World Wide Web Conference

View full text Add to dashboard Cite

Recently, considerable research attention has been paid to network embedding, a popular approach to construct feature vectors of vertices in latent space. Due to the curse of dimensionality and sparsity in graphical datasets, this approach has become indispensable for machine learning tasks over large networks. The majority of existing literature has considered this technique under the assumption that the network is static. However, networks in many applications, including social networks, collaboration networks, and recommender systems, nodes and edges accrue to a growing network as a streaming. Moreover, high-throughput production machine learning systems require to promptly generate representations for new vertices. A small number of very recent results have address the problem of embedding for dynamic networks. However, they either rely on knowledge of vertex attributes, suffer high-time complexity, or need to be re-trained without closed-form expression. Thus the approach of adapting of the existing methods designed for static networks or dynamic networks to the streaming environment faces non-trivial technical challenges.These challenges motivate developing new approaches to the problems of streaming network embedding. In this paper We propose a new framework that is able to generate latent features for new vertices with high efficiency and low complexity under specified iteration rounds. We formulate a constrained optimization problem for the modification of the representation resulting from a stream arrival. We show this problem has no closed-form solution and instead develop an online approximation solution. Our solution follows three steps: (1) identify vertices affected by newly arrived ones, (2) generating latent features for new vertices, and (3) updating the latent features of the most affected vertices. The generated representations are provably feasible and not far from the optimal ones in terms of expectation. Multi-class classification and clustering on five real-world networks demonstrate that our model can efficiently update vertex representations and simultaneously achieve comparable or even better performance compared with model retraining.

show abstract

“…Alongside deterministic algorithms, the stochastic gradient descent method (SGD) and the stochastic variance reduced gradient method (SVRG) have also been extended to optimization over Riemannian manifold; see e.g. [28,30,38,61,62]. Compared to all these approaches, our proposed methods allow a nonsmooth objective, a constraint x i ∈ X i , as well as the coupling affine constraints.…”

Section: Related Literaturementioning

confidence: 99%

“…Combining (82), (83), (84) and (38), we obtain E[Ψ S (x k+1 1 , · · · , x k+1 N −1 , x k+1 N , λ k+1 , x k N )] − E[Ψ S (x k 1 , · · · , x k N −1 , x k N , λ k , x k−1 N )] (85)…”

Section: A4 Proof Of Lemma 312mentioning

confidence: 99%

Primal-dual optimization algorithms over Riemannian manifolds: an iteration complexity analysis

Zhang

2019

Math. Program.

View full text Add to dashboard Cite

In this paper we study nonconvex and nonsmooth multi-block optimization over Riemannian manifolds with coupled linear constraints. Such optimization problems naturally arise from machine learning, statistical learning, compressive sensing, image processing, and tensor PCA, among others. We develop an ADMM-like primal-dual approach based on decoupled solvable subroutines such as linearized proximal mappings. First, we introduce the optimality conditions for the afore-mentioned optimization models. Then, the notion of ǫ-stationary solutions is introduced as a result. The main part of the paper is to show that the proposed algorithms enjoy an iteration complexity of O(1/ǫ 2 ) to reach an ǫ-stationary solution. For prohibitively large-size tensor or machine learning models, we present a sampling-based stochastic algorithm with the same iteration complexity bound in expectation. In case the subproblems are not analytically solvable, a feasible curvilinear line-search variant of the algorithm based on retraction operators is proposed. Finally, we show specifically how the algorithms can be implemented to solve a variety of practical problems such as the NP-hard maximum bisection problem, the ℓ q regularized sparse tensor principal component analysis and the community detection problem. Our preliminary numerical results show great potentials of the proposed methods.

show abstract

Quadratic optimization with orthogonality constraint: explicit Łojasiewicz exponent and linear convergence of retraction-based line-search and stochastic variance-reduced gradient methods

Cited by 70 publications

References 48 publications

Novel Reformulations and Efficient Algorithms for the Generalized Trust Region Subproblem

Novel Reformulations and Efficient Algorithms for the Generalized Trust Region Subproblem

Real-Time Streaming Graph Embedding Through Local Actions

Primal-dual optimization algorithms over Riemannian manifolds: an iteration complexity analysis

Contact Info

Product

Resources

About