“…The main principle is to characterize statistical and computational trade-offs, that is, to sacrifice statistical accuracy to gain computational benefits. The representative methods include Nyström (Yin et al 2021(Yin et al , 2020aRudi, Carratino, and Rosasco 2017;Li, Kwok, and Lu 2010) which constructs the approximate kernel matrix with a few anchor points, random features (Liu, Liu, and Wang 2021;Li, Liu, and Wang 2019;Rudi, Camoriano, and Rosasco 2016;Rahimi and Recht 2007), iterative optimization (Lin and Cevher 2020;Lin, Lei, and Zhou 2019;Shalev Shwartz et al 2011), distributed learning (Liu, Liu, and Wang 2021;Lin, Wang, and Zhou 2020;Wang 2019;Guo, Lin, and Shi 2019;Chang, Lin, and Zhou 2017;Lin, Guo, and Zhou 2017;Zhang, Duchi, andWainwright 2015, 2013) which divides the training data into some subsets for processing on local processors and carry out necessary communications, and randomized sketching (Lin and Cevher 2020;Lian, Liu, and Fan 2021;Liu, Shang, and Cheng 2019;Yang, Pilanci, and Wainwright 2017) which projects the kernel matrix into a small one based on the sketch matrix. The above studies show that randomized sketching and distributed learning have outstanding effects in kernel methods.…”