A distributed Frank–Wolfe framework for learning low-rank matrices with the trace norm

Zheng, Wenjie; Bellet, Aurélien; Gallinari, Patrick

doi:10.1007/s10994-018-5713-5

“…As a projection-free algorithm, Frank-Wolfe method [Frank and Wolfe, 1956] has been studied for both convex optimization [Jaggi, 2013, Lacoste-Julien and Jaggi, 2015, Garber and Hazan, 2015, Hazan and Luo, 2016, Mokhtari et al, 2018b and non-convex optimization problems [Lacoste-Julien, 2016, Reddi et al, 2016, Mokhtari et al, 2018c, Shen et al, 2019b. In large-scale settings, distributed FW methods were proposed to solve specific problems, including optimization under block-separable constraint set [Wang et al, 2016], and learning low-rank matrices [Zheng et al, 2018]. The communication-efficient distributed FW variants were proposed for specific sparse learning problems in Bellet et al [2015], Lafond et al [2016], and for general constrained optimization problems in .…”

Section: Related Workmentioning

confidence: 99%

One Sample Stochastic Frank-Wolfe

Zhang,

Shen,

Mokhtari

et al. 2019

Preprint

View full text Add to dashboard Cite

One of the beauties of the projected gradient descent method lies in its rather simple mechanism and yet stable behavior with inexact, stochastic gradients, which has led to its wide-spread use in many machine learning applications. However, once we replace the projection operator with a simpler linear program, as is done in the Frank-Wolfe method, both simplicity and stability take a serious hit. The aim of this paper is to bring them back without sacrificing the efficiency. In this paper, we propose the first one-sample stochastic Frank-Wolfe algorithm, called 1-SFW, that avoids the need to carefully tune the batch size, step size, learning rate, and other complicated hyper parameters. In particular, 1-SFW achieves the optimal convergence rate of O(1/ 2 ) for reaching an -suboptimal solution in the stochastic convex setting, and a (1 − 1/e) − approximate solution for a stochastic monotone DR-submodular maximization problem. Moreover, in a general non-convex setting, 1-SFW finds an -first-order stationary point after at most O(1/ 3 ) iterations, achieving the current best known convergence rate. All of this is possible by designing a novel unbiased momentum estimator that governs the stability of the optimization process while using a single sample at each iteration.

show abstract

“…To reduce the computation complexity, many efficient solvers have been developed by replacing the full SVD with a partial SVD. However, those approaches either require the function to be smooth (Wang, Kolar, and Srebro 2016) or are designed for the constraint optimization (Zheng, Bellet, and Gallinari 2018). Furthermore, the problem that different tasks generally have different noise levels was ignored by those approaches mentioned above.…”

Section: Introductionmentioning

confidence: 99%

Distributed Primal-Dual Optimization for Online Multi-Task Learning

Yang

¹

,

Li

²

2020

AAAI

View full text Add to dashboard Cite

Conventional online multi-task learning algorithms suffer from two critical limitations: 1) Heavy communication caused by delivering high velocity of sequential data to a central machine; 2) Expensive runtime complexity for building task relatedness. To address these issues, in this paper we consider a setting where multiple tasks are geographically located in different places, where one task can synchronize data with others to leverage knowledge of related tasks. Specifically, we propose an adaptive primal-dual algorithm, which not only captures task-specific noise in adversarial learning but also carries out a projection-free update with runtime efficiency. Moreover, our model is well-suited to decentralized periodic-connected tasks as it allows the energy-starved or bandwidth-constraint tasks to postpone the update. Theoretical results demonstrate the convergence guarantee of our distributed algorithm with an optimal regret. Empirical results confirm that the proposed model is highly effective on various real-world datasets.

show abstract

“…By transferring information between tasks it is hoped that samples will be better utilized, leading to improved generalization performance. MTL has been successfully applied in practical scenarios, e.g., speech recognition (Seltzer and Droppo 2013), image classification (Lapin, Schiele, and Hein 2014), disease gene prediction (Zhou et al 2013), etc. Recent years also witness extensive studies on streaming data, known as online multi-task learning (OMTL) (Dekel, Long, and Singer 2006;Saha et al 2011;Yang, Zhao, and Gao 2017), for the merits of capturing the dynamically changing and uncertain nature of the environment, which is in contrast to the offline setting in which the objective functions are fixed (Liu, Pan, and Ho 2017;Smith et al 2017).…”

Section: Introductionmentioning

confidence: 99%

Distributed Primal-Dual Optimization for Online Multi-Task Learning

Yang

¹

,

Li

²

2020

Preprint

0

View full text Add to dashboard Cite

Conventional online multi-task learning algorithms suffer from two critical limitations: 1) Heavy communication caused by delivering high velocity of sequential data to a central machine; 2) Expensive runtime complexity for building task relatedness. To address these issues, in this paper we consider a setting where multiple tasks are geographically located in different places, where one task can synchronize data with others to leverage knowledge of related tasks. Specifically, we propose an adaptive primal-dual algorithm, which not only captures task-specific noise in adversarial learning but also carries out a projection-free update with runtime efficiency. Moreover, our model is well-suited to decentralized periodic-connected tasks as it allows the energy-starved or bandwidth-constraint tasks to postpone the update. Theoretical results demonstrate the convergence guarantee of our distributed algorithm with an optimal regret. Empirical results confirm that the proposed model is highly effective on various real-world datasets.

show abstract

A distributed Frank–Wolfe framework for learning low-rank matrices with the trace norm

Cited by 12 publications

References 44 publications

One Sample Stochastic Frank-Wolfe

One Sample Stochastic Frank-Wolfe

Distributed Primal-Dual Optimization for Online Multi-Task Learning

Distributed Primal-Dual Optimization for Online Multi-Task Learning

Contact Info

Product

Resources

About