“…Recently, several prevailing machine learning applications can be naturally formulated as a bilevel programming problem (Maclaurin et al, 2015;Pedregosa, 2016;Finn et al, 2017;Franceschi et al, 2017Franceschi et al, , 2018Ji et al, 2020), which brings a lot of attention to the bilevel programming in the machine learning community. On the theoretical side, there are many existing works deriving both asymptotic (Franceschi et al, 2018;Shaban et al, 2019;Liu et al, 2021) and non-asymptotic (Ghadimi & Wang, 2018;Hong et al, 2020;Chen et al, 2021a;Guo & Yang, 2021; convergence analysis for the determinstic or stochastic bilevel optimization. For example, Ghadimi & Wang 2018;Hong et al 2020;Arbel & Mairal 2022 proved the convergence for SGD type of bilevel methods via the approximate implicit differentiation (AID) approach.…”