2018
DOI: 10.1007/s10994-018-5713-5
|View full text |Cite
|
Sign up to set email alerts
|

A distributed Frank–Wolfe framework for learning low-rank matrices with the trace norm

Abstract: We consider the problem of learning a high-dimensional but low-rank matrix from a large-scale dataset distributed over several machines, where low-rankness is enforced by a convex trace norm constraint. We propose DFW-Trace, a distributed Frank-Wolfe algorithm which leverages the low-rank structure of its updates to achieve efficiency in time, memory and communication usage. The step at the heart of DFW-Trace is solved approximately using a distributed version of the power method. We provide a theoretical anal… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
9
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 12 publications
(9 citation statements)
references
References 44 publications
0
9
0
Order By: Relevance
“…As a projection-free algorithm, Frank-Wolfe method [Frank and Wolfe, 1956] has been studied for both convex optimization [Jaggi, 2013, Lacoste-Julien and Jaggi, 2015, Garber and Hazan, 2015, Hazan and Luo, 2016, Mokhtari et al, 2018b and non-convex optimization problems [Lacoste-Julien, 2016, Reddi et al, 2016, Mokhtari et al, 2018c, Shen et al, 2019b. In large-scale settings, distributed FW methods were proposed to solve specific problems, including optimization under block-separable constraint set [Wang et al, 2016], and learning low-rank matrices [Zheng et al, 2018]. The communication-efficient distributed FW variants were proposed for specific sparse learning problems in Bellet et al [2015], Lafond et al [2016], and for general constrained optimization problems in .…”
Section: Related Workmentioning
confidence: 99%
“…As a projection-free algorithm, Frank-Wolfe method [Frank and Wolfe, 1956] has been studied for both convex optimization [Jaggi, 2013, Lacoste-Julien and Jaggi, 2015, Garber and Hazan, 2015, Hazan and Luo, 2016, Mokhtari et al, 2018b and non-convex optimization problems [Lacoste-Julien, 2016, Reddi et al, 2016, Mokhtari et al, 2018c, Shen et al, 2019b. In large-scale settings, distributed FW methods were proposed to solve specific problems, including optimization under block-separable constraint set [Wang et al, 2016], and learning low-rank matrices [Zheng et al, 2018]. The communication-efficient distributed FW variants were proposed for specific sparse learning problems in Bellet et al [2015], Lafond et al [2016], and for general constrained optimization problems in .…”
Section: Related Workmentioning
confidence: 99%
“…To reduce the computation complexity, many efficient solvers have been developed by replacing the full SVD with a partial SVD. However, those approaches either require the function to be smooth (Wang, Kolar, and Srebro 2016) or are designed for the constraint optimization (Zheng, Bellet, and Gallinari 2018). Furthermore, the problem that different tasks generally have different noise levels was ignored by those approaches mentioned above.…”
Section: Introductionmentioning
confidence: 99%
“…By transferring information between tasks it is hoped that samples will be better utilized, leading to improved generalization performance. MTL has been successfully applied in practical scenarios, e.g., speech recognition (Seltzer and Droppo 2013), image classification (Lapin, Schiele, and Hein 2014), disease gene prediction (Zhou et al 2013), etc. Recent years also witness extensive studies on streaming data, known as online multi-task learning (OMTL) (Dekel, Long, and Singer 2006;Saha et al 2011;Yang, Zhao, and Gao 2017), for the merits of capturing the dynamically changing and uncertain nature of the environment, which is in contrast to the offline setting in which the objective functions are fixed (Liu, Pan, and Ho 2017;Smith et al 2017).…”
Section: Introductionmentioning
confidence: 99%