“…Not only these dichotomies but also the particular delta‐learning rules of the two‐dimensional GRL model distinguish it from previous modifications of model‐free RL. Often arrived at without the due diligence of model comparison, some modifications have simply yoked value representations—for example, Q t (s t ,a 1 ) ≡ −Q t (s t ,a 2 ) —or otherwise incorporated only one type of generalization (Aquino et al, 2020; Balcarras & Womelsdorf, 2016; Ballard et al, 2019; Baram et al, 2021; Charpentier et al, 2020; Collette et al, 2017; Daw & Shohamy, 2008; Gläscher et al, 2009; Hampton et al, 2007; Hauser et al, 2014, 2015; Lesage & Verguts, 2021; Liu et al, 2021; Matsumoto et al, 2007; Mattar & Daw, 2018; Reiter et al, 2017; Vinckier et al, 2016; Wimmer et al, 2012; Zaki et al, 2016). Moreover, such models are often formulated without parameterization (e.g., g A = −1) or with a second, counterfactual RPE inverting the only outcome (i.e., r′ = −r or r′ = 0 for r > 0) in parallel—and, by extension, multiple RPEs as required—as opposed to the current algorithmic scheme of GRL with weighted duplications of the original RPE signal to be relayed to parallel representations of estimated values.…”