“…The models of Reichle and Laurent (2006) and Lewis et al (2013) are particularly related to ours in that they optimize policies for rewards that explicitly trade off economy with accuracy (on word identification, Reichle & Laurent, 2006, or lexical decision, Lewis et al, 2013. Beyond language, models of visual behavior based on reinforcement learning have been proposed in other domains (e.g., Acharya, Chen, Myers, Lewis, & Howes, 2017;Butko & Movellan, 2008;Hayhoe & Ballard, 2014;Nuñez-Varela & Wyatt, 2013;Sprague, Ballard, & Robinson, 2007), using both policy-gradient methods like in our model (Butko & Movellan, 2008) and Q-Learning algorithms (Acharya et al, 2017;Nuñez-Varela & Wyatt, 2013;Sprague et al, 2007).…”