“…Explicit comparisons collected on model outputs are used to reveal the preferences of human raters (Gao et al, 2018;Ziegler et al, 2019;Askell et al, 2021;Jaques et al, 2020;Stiennon et al, 2020;Ganguli et al, 2022;Glaese et al, 2022). 6 More finegrained feedback includes binary or Likert scale questions on text attributes (Nakano et al, 2021;Menick et al, 2022;Thoppilan et al, 2022); natural language comments (Ju et al, 2022;Scheurer et al, 2022); or edits (Hancock et al, 2019;Liu et al, 2023c). Ideal demonstrations are used to ground norm-dependent or ethical judgements (Forbes et al, 2020;Pyatkin et al, 2022;Jin et al, 2022), or in combination with ratings to prime model behaviour (Nakano et al, 2021;Wu et al, 2021;Ouyang et al, 2022;Bakker et al, 2022).…”