This paper analyzes a two-timescale stochastic algorithm for a class of bilevel optimization problems with applications such as policy optimization in reinforcement learning, hyperparameter optimization, among others. We consider a case when the inner problem is unconstrained and strongly convex, and the outer problem is either strongly convex, convex or weakly convex. We propose a nonlinear two-timescale stochastic approximation (TTSA) algorithm for tackling the bilevel optimization. In the algorithm, a stochastic (semi)gradient update with a larger step size (faster timescale) is used for the inner problem, while a stochastic mirror descent update with a smaller step size (slower timescale) is used for the outer problem. When the outer problem is strongly convex (resp. weakly convex), the TTSA algorithm finds an O(K −1/2 )-optimal (resp. O(K −2/5 )-stationary) solution, where K is the iteration number. To our best knowledge, these are the first convergence rate results for using nonlinear TTSA algorithms on the concerned class of bilevel optimization problems. Lastly, specific to the application of policy optimization, we show that a two-timescale actor-critic proximal policy optimization algorithm can be viewed as a special case of our framework. The actor-critic algorithm converges at O(K −1/4 ) in terms of the gap in objective value to a globally optimal policy.
An implicit assumption made in studies on state estimation is that the time and frequency at which these measurements are taken is consistent across all the distributed sensing sites. For instance, in the literatures on Wide Area Measurement Systems (WAMS) deployed in the power grid, where the sensors equipped with Global Positioning Signals (GPS), the sensing sites are deemed capable to provide perfectly synchronous readings at the various sampling sites. The validity of the assumption may need to be re-examined with the recent advancements in decentralized state estimation algorithms. Importantly, when there are timing offsets between sampling devices, the effects on the measurement system's performance can be catastrophic. The prevalent point of view is to either study the resulting error, or to resort to Kalman filtering for aligning the measurements. Taking on this view typically requires additional information about the underlying state. In this paper, we revisit the problem of state estimation and propose a new model for data acquisition under asynchronous sampling. The key idea is to apply sampling theory and to exploit the redundancy in the spatial sampling to interpolate the system state. We provide a necessary and sufficient condition for identifiability of the time offsets and propose an algorithm for the joint regression on state and timing offsets. The efficacy of the proposed algorithm is shown by numerical simulations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.