“…The LQT problem for sequentially revealed adversarial reference states is studied mostly with policy regret guarantees, with one of the first works [3] suggesting a relatively computationally heavy algorithm. In a more recent line of work [4], the authors introduce a memory-based, gradient descent algorithm and in [5], tackle the constrained tracking problem. Several works also provide dynamic regret guarantees for tracking of unknown targets, however, their settings differ from ours.…”