“…To this end, we usually leverage the idea from scalable GPs that have been recently compared and reviewed in [32,33]. The majority of scalable LMCs relies on the framework of sparse approximation [34,35,36], which introduces M inducing variables to be the sufficient statistics of N latent function values for a task with M N , thus greatly reducing the cubic model complexity [37,26,38,39,40]. Other complexity reduction strategies have also been investigated through for example the distributed learning [41], the natural gradient assisted stochastic variational inference [42], and the exploitation of Kronecker structure in kernel matrix [43,44].…”