2016
DOI: 10.1016/j.cobeha.2016.04.005
|View full text |Cite
|
Sign up to set email alerts
|

Reinforcement learning with Marr

Abstract: To many, the poster child for David Marr’s famous three levels of scientific inquiry is reinforcement learning—a computational theory of reward optimization, which readily prescribes algorithmic solutions that evidence striking resemblance to signals found in the brain, suggesting a straightforward neural implementation. Here we review questions that remain open at each level of analysis, concluding that the path forward to their resolution calls for inspiration across levels, rather than a focus on mutual con… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
29
0

Year Published

2016
2016
2024
2024

Publication Types

Select...
4
3
2

Relationship

1
8

Authors

Journals

citations
Cited by 45 publications
(29 citation statements)
references
References 89 publications
0
29
0
Order By: Relevance
“…This can explain the experimentally observed ramping DA signals [ 5 8 ] as reflecting a gradient of state values created by temporal discounting (as in our Fig 6A and 6B ), also consistent with the arguments by [ 7 ]. These normative hypotheses, at the Marr's levels of computation and algorithm [ 44 , 45 ], provide intriguing predictions that are desired to be experimentally tested. Meanwhile, it is also important to explore the Marr's level of implementation, namely, circuit/synaptic operations, which could potentially provide inspirations for the upper levels and vice versa [ 45 ].…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…This can explain the experimentally observed ramping DA signals [ 5 8 ] as reflecting a gradient of state values created by temporal discounting (as in our Fig 6A and 6B ), also consistent with the arguments by [ 7 ]. These normative hypotheses, at the Marr's levels of computation and algorithm [ 44 , 45 ], provide intriguing predictions that are desired to be experimentally tested. Meanwhile, it is also important to explore the Marr's level of implementation, namely, circuit/synaptic operations, which could potentially provide inspirations for the upper levels and vice versa [ 45 ].…”
Section: Discussionmentioning
confidence: 99%
“…These normative hypotheses, at the Marr's levels of computation and algorithm [ 44 , 45 ], provide intriguing predictions that are desired to be experimentally tested. Meanwhile, it is also important to explore the Marr's level of implementation, namely, circuit/synaptic operations, which could potentially provide inspirations for the upper levels and vice versa [ 45 ]. The abovementioned normative hypotheses highlight essential issues at the circuit/synaptic level, including how the sustained DA signals are generated in the upstream and utilized in the downstream, how the selection of action timing is implemented, and how temporal discounting is implemented.…”
Section: Discussionmentioning
confidence: 99%
“…We note that this is the opposite of the way value functions are usually thought of in neuroscience, which stresses the primacy of the value function from which policies are derived. However, there is experimental evidence that signals in the striatum are more suitable for direct policy search rather than for updating action values as an intermediate step, as would be the case for value function-based approaches to computing the policy (Li & Daw, 2011;Niv & Langdon, 2016). Moreover, although we have used a single RNN each to represent the policy and value modules, using "deep," multilayer RNNs may increase the representational power of each module (Pascanu et al, 2013a).…”
Section: Discussionmentioning
confidence: 99%
“…The actor is in charge of choosing an action in a given state (policy) while the critic is in charge of evaluating (criticizing) the current state (value function). This classical view has been used extensively for modelling the basal ganglia (Suri, R E & Schultz, W, 1999; Suri, 2002; Frank, 2004; Doya, 2007; Glimcher, 2011; Doll, Bradley B, Simon, Dylan A, & Daw, Nathaniel D, 2012) even though the precise anatomical mapping of these two components is still subject to debate and may diverge from one model to the other (Redgrave, Peter, Gurney, Kevin, & Reynolds, John, 2008; Niv, Yael & Langdon, Angela, 2016). However, all these models share the implicit assumption that the actor and the critic are acting in concert, i.e.…”
Section: Introductionmentioning
confidence: 99%