2017
DOI: 10.1109/tsp.2017.2755597
|View full text |Cite
|
Sign up to set email alerts
|

Randomized Block Frank–Wolfe for Convergent Large-Scale Learning

Abstract: Abstract-Owing to their low-complexity iterations, FrankWolfe (FW) solvers are well suited for various large-scale learning tasks. When block-separable constraints are present, randomized block FW (RB-FW) has been shown to further reduce complexity by updating only a fraction of coordinate blocks per iteration. To circumvent the limitations of existing methods, the present work develops step sizes for RB-FW that enable a flexible selection of the number of blocks to update per iteration while ensuring converge… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
10
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
6
2
1

Relationship

3
6

Authors

Journals

citations
Cited by 18 publications
(10 citation statements)
references
References 19 publications
0
10
0
Order By: Relevance
“…Remark 5. When the kernel function required to form K xx , K yy , and K xy is not given, one may use the multi-kernel learning method to automatically choose the right kernel function(s); see for example, [4], [38], [39]. Specifically, one can presume K xx := P i=1 δ i K i xx , K yy := P i=1 δ i K i yy , and K xy := P i=1 δ i K i xy in (18), where K i xx ∈ R m×m , K i yy ∈ R n×n , and K i xy ∈ R m×n are formed using the kernel function κ i (•); and {κ i (•)} P i=1 are a preselected dictionary of known kernels, but {δ i } P i=1 will be treated as unknowns to be learned along with A in (18).…”
Section: Kernel Dpcamentioning
confidence: 99%
“…Remark 5. When the kernel function required to form K xx , K yy , and K xy is not given, one may use the multi-kernel learning method to automatically choose the right kernel function(s); see for example, [4], [38], [39]. Specifically, one can presume K xx := P i=1 δ i K i xx , K yy := P i=1 δ i K i yy , and K xy := P i=1 δ i K i xy in (18), where K i xx ∈ R m×m , K i yy ∈ R n×n , and K i xy ∈ R m×n are formed using the kernel function κ i (•); and {κ i (•)} P i=1 are a preselected dictionary of known kernels, but {δ i } P i=1 will be treated as unknowns to be learned along with A in (18).…”
Section: Kernel Dpcamentioning
confidence: 99%
“…. , θ (η) (M )] T can be found by jointly minimizing (48) with Frank-Wolfe algorithm [65]. When the kernel matrices belong to the Laplacian family (16), efficient algorithms that exploit the common eigenspace of the kernels in the dictionary have been developed in [61].…”
Section: Online Multi-kernel Learningmentioning
confidence: 99%
“…Providing the easiness in implementation and enabling structural solutions, FW is of interest in various applications. Besides those mentioned earlier, other examples encompass structural SVM [10], video colocation [11], particle filtering [12], traffic assignment [13], and optimal transport [14], electronic vehicle charging [15], [16], and submodular optimization [17].…”
Section: Introductionmentioning
confidence: 99%