“…On the other hand, in the literature of ranking, most of the theoretical work focuses on the tabular case where the rewards for different actions are uncorrelated (Feige et al, 1994;Shah et al, 2015;Shah and Wainwright, 2017;Heckel et al, 2018;Mao et al, 2018;Jang et al, 2017;Chen et al, 2013;Chen and Suh, 2015;Rajkumar and Agarwal, 2014;Negahban et al, 2018;Hajek et al, 2014;Heckel et al, 2019). And a majority of the empirical literature focuses on the framework of learning to rank (MLE) under general function approximation, especially when the reward is parameterized by a neural network (Liu et al, 2009;Xia et al, 2008;Cao et al, 2007;Christiano et al, 2017a;Ouyang et al, 2022;Brown et al, 2019;Shin et al, 2023;Busa-Fekete et al, 2014;Wirth et al, 2016Wirth et al, , 2017Christiano et al, 2017b;Abdelkareem et al, 2022). Similar idea of RL with AI feedback also learns a reward model from preference Bai et al (2022b), except for that the preference is labeled by another AI model instead of human.…”