Distributed strongly convex optimization

Tsianos, Konstantinos I.; Rabbat, Michael

doi:10.1109/allerton.2012.6483272

Cited by 57 publications

(60 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…the strong convexity (Hazan & Kale, 2011) and strong smoothness are dual properties, strongly convex programming algorithms have many benign properties both on the speed of optimization and the quality of generalization; see, for examples, (Hazan & Kale, 2011;Rakhlin et al, 2012;Tsianos & Rabbat, 2012;Kakade & Tewari, 2009). …”

Section: Theoremmentioning

confidence: 99%

Algorithm-Dependent Generalization Bounds for Multi-Task Learning

Liu

Tao

Song

et al. 2017

IEEE Trans. Pattern Anal. Mach. Intell.

113

View full text Add to dashboard Cite

Often, tasks are collected for multi-task learning (MTL) because they share similar feature structures. Based on this observation, in this paper, we present novel algorithm-dependent generalization bounds for MTL by exploiting the notion of algorithmic stability. We focus on the performance of one particular task and the average performance over multiple tasks by analyzing the generalization 1 ability of a common parameter that is shared in MTL. When focusing on one particular task, with the help of a mild assumption on the feature structures, we interpret the function of the other tasks as a regularizer that produces a specific inductive bias. The algorithm for learning the common parameter, as well as the predictor, is thereby uniformly stable with respect to the domain of the particular task and has a generalization bound with a fast convergence rate of order O(1/n), where n is the sample size of the particular task. When focusing on the average performance over multiple tasks, we prove that a similar inductive bias exists under certain conditions on the feature structures. Thus, the corresponding algorithm for learning the common parameter is also uniformly stable with respect to the domains of the multiple tasks, and its generalization bound is of the order O(1/T ), where T is the number of tasks. These theoretical analyses naturally show that the similarity of feature structures in MTL will lead to specific regularizations for predicting, which enables the learning algorithms to generalize fast and correctly from a few examples.

show abstract

Section: Theoremmentioning

confidence: 99%

Algorithm-Dependent Generalization Bounds for Multi-Task Learning

Liu

Tao

Song

et al. 2017

IEEE Trans. Pattern Anal. Mach. Intell.

113

View full text Add to dashboard Cite

show abstract

“…Distributed subgradient descent fits for asynchronous networks, but suffers from slow con vergence. The descent rate of objective value is typically O(log(k)/k) where k is the number of iterations [12]. The ADMM generally needs synchronous steps taken by all the agents, but has much faster empirical convergence.…”

Section: Related Workmentioning

confidence: 99%

“…Proof Subtracting the three equations in (8) from the cor responding equations in (6) yields \7f(xk +l ) -\7f(x*) = cM + (zk -zk +l ) -M_ ((3k+l -(3* ), (12) � M!. (xk+l -x*) = (3k+l -(3k , (13) �MJ (Xk+l -x*) = Zk +l -z*, (14) respectively.…”

Section: (11)mentioning

confidence: 99%

“…(xk+l -x*) = (3k+l -(3k , (13) �MJ (Xk+l -x*) = Zk +l -z*, (14) respectively. Therefore, we can bound I l xk +l -x* II § with mf ll xk +l -x* ll § < (xk+lx*, \7 f (xkH ) -\7 f (x*)) (xk+lx*, cM + (zk -Zk +l ) -M_ ((3k +l -(3* )) (xk+l -x*, cM + (zk -Zk +l )) + (xk+1 -x* , _ M_ ((3k +l -(3* )) 2c(zk -zk +1 ,Zk +l -z*) + % ((3k -(3k+l , (3k+l -(3* ) 2( u ku k+l ) T G ( u k+lu *) I l u ku * llbllu k+1u * llb -I l u ku k+l llb , (15) where the inequality follows from the strong convexity of f(x); the first equality follows from (12); the third equality comes from (13) and (14).…”

Section: (11)mentioning

confidence: 99%

“…\7 f(x); the second inequality follows from(12) and the basic inequality I i a + b ll § :::: : ( 1 -J-l ) lla ll� + ( 1 -� ) llbll� , \:I J-l > 0 ; the third inequality holds since we choose (30 in the column space of M!. such that (3k+l and (3* also lie in the column space of M!.…”

mentioning

confidence: 98%

See 2 more Smart Citations

Linearly convergent decentralized consensus optimization with the alternating direction method of multipliers

Shi

Ling

Yuan

et al. 2013

2013 IEEE International Conference on Acoustics, Speech and Signal Processing

View full text Add to dashboard Cite

In the decentralized consensus optimization problem, a net work of agents minimizes the summation of their local ob jective functions on a common set of variables, allowing only information exchange among neighbors. The alternating di rection method of multipliers (ADMM) has been shown to be a powerful tool for solving the problem with empirically fast convergence. This paper establishes the linear convergence rate of the ADMM in decentralized consensus optimization. The theoretical convergence rate is a function of the network topology, properties of the local objective functions, and the algorithm parameter. This result not only gives a performance guarantee for the ADMM but also provides a guideline to ac celerate its convergence rate for the decentralized consensus optimization problems.

show abstract

A stochastic averaging gradient algorithm with multi‐step communication for distributed optimization

Zheng

Feng

et al. 2023

Optim Control Appl Methods

View full text Add to dashboard Cite

This paper studies distributed convex optimization problems over an undirected network where all nodes cooperate to minimize a sum of local objective functions. Each local objective function is further assumed to be an average of several convex instantaneous functions. By incorporating the stochastic averaging gradient into the distributed first-order primal-dual method, a stochastic averaging gradient algorithm with multi-step communication is proposed to solve the optimization problem. For each node, one randomly selected gradient of an instantaneous function is evaluated per iteration, which effectively reduces the computation cost of the algorithm. Based on conditions of strong convexity and Lipschitz continuity of the local instantaneous functions, a linear convergence rate of the proposed algorithm is guaranteed. Numerical simulations on the logistic regression problem demonstrate the performance of the algorithm and correctness of the theoretical results.

show abstract

Distributed strongly convex optimization

Cited by 57 publications

References 13 publications

Algorithm-Dependent Generalization Bounds for Multi-Task Learning

Algorithm-Dependent Generalization Bounds for Multi-Task Learning

Linearly convergent decentralized consensus optimization with the alternating direction method of multipliers

A stochastic averaging gradient algorithm with multi‐step communication for distributed optimization

Contact Info

Product

Resources

About