Divide-and-Conquer Learning with Nyström: Optimal Rate and Algorithm

Yin, Rong; Liu, Yong; Lu, Lijing; Wang, Weiping; Meng, Dan

doi:10.1609/aaai.v34i04.6147

Cited by 5 publications

(8 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Compared with (Liu, Liu, and Wang 2021;Yin et al 2021;Lin, Wang, and Zhou 2020;Yin et al 2020a; N 2 N N 0.5 N 0.5 / Exp DKRR (Lin, Wang, and Zhou 2020) N 2.25 N 1.5 N 0.75 N 0.25 / Pro DKRR-CM (Lin, Wang, and Zhou 2020) N…”

Section: Compared With the Related Workmentioning

confidence: 97%

“…The representative distributed KRR includes DKRR (Guo, Lin, and Shi 2019;Chang, Lin, and Zhou 2017;Lin, Guo, and Zhou 2017;Zhang, Duchi, andWainwright 2015, 2013) based on divide-and-conquer, DKRR-RF (Li, Liu, and Wang 2019) based on DKRR and random features (Rudi, Camoriano, and Rosasco 2016), and DKRR-NY-PCG (Yin et al 2020a) based on DKRR and Nyström-PCG (Rudi, Carratino, and Rosasco 2017), which derive the optimal learning rate in expectation. However, they have a restricted limitation in the number of local processors p, that is, to derive the optimal learning rate, p should be restricted to a constant at the most popular case (ζ = 1/2, γ = 1).…”

Section: Related Workmentioning

confidence: 99%

“…Remark 2. Optimal learning rates for DKRR (Lin, Guo, and Zhou 2017;Guo, Lin, and Shi 2019), DKRR-NY-PCG (Yin et al 2020a), and DKRR-RF (Li, Liu, and Wang 2019) in expectation have been established. However, they have a strict restriction on the number of local processors p.…”

Section: Optimal Learning Rate For Dkrr-rs In Expectationmentioning

confidence: 99%

“…with the optimal learning rate, where 2(ζ − 1)γ + 1 > 0. Compared to DKRR-NY-PCG (Yin et al 2020a), DKRR-RS reduces the time complexity and space complexity by factors of N 1−γ 2ζ+γ and N γ 2ζ+γ with the optimal learning rate, where 1 − γ ≥ 0.…”

Section: (ζ−1)γ+2 2ζ+γmentioning

confidence: 99%

“…The main principle is to characterize statistical and computational trade-offs, that is, to sacrifice statistical accuracy to gain computational benefits. The representative methods include Nyström (Yin et al 2021(Yin et al , 2020aRudi, Carratino, and Rosasco 2017;Li, Kwok, and Lu 2010) which constructs the approximate kernel matrix with a few anchor points, random features (Liu, Liu, and Wang 2021;Li, Liu, and Wang 2019;Rudi, Camoriano, and Rosasco 2016;Rahimi and Recht 2007), iterative optimization (Lin and Cevher 2020;Lin, Lei, and Zhou 2019;Shalev Shwartz et al 2011), distributed learning (Liu, Liu, and Wang 2021;Lin, Wang, and Zhou 2020;Wang 2019;Guo, Lin, and Shi 2019;Chang, Lin, and Zhou 2017;Lin, Guo, and Zhou 2017;Zhang, Duchi, andWainwright 2015, 2013) which divides the training data into some subsets for processing on local processors and carry out necessary communications, and randomized sketching (Lin and Cevher 2020;Lian, Liu, and Fan 2021;Liu, Shang, and Cheng 2019;Yang, Pilanci, and Wainwright 2017) which projects the kernel matrix into a small one based on the sketch matrix. The above studies show that randomized sketching and distributed learning have outstanding effects in kernel methods.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Distributed Randomized Sketching Kernel Learning

Yin

Liu

Meng

2022

AAAI

Self Cite

View full text Add to dashboard Cite

We investigate the statistical and computational requirements for distributed kernel ridge regression with randomized sketching (DKRR-RS) and successfully achieve the optimal learning rates with only a fraction of computations. More precisely, the proposed DKRR-RS combines sparse randomized sketching, divide-and-conquer and KRR to scale up kernel methods and successfully derives the same learning rate as the exact KRR with greatly reducing computational costs in expectation, at the basic setting, which outperforms previous state of the art solutions. Then, for the sake of the gap between theory and experiments, we derive the optimal learning rate in probability for DKRR-RS to reflect its generalization performance. Finally, to further improve the learning performance, we construct an efficient communication strategy for DKRR-RS and demonstrate the power of communications via theoretical assessment. An extensive experiment validates the effectiveness of DKRR-RS and the communication strategy on real datasets.

show abstract

Section: Compared With the Related Workmentioning

confidence: 97%