2009
DOI: 10.1109/tnn.2009.2025588
|View full text |Cite
|
Sign up to set email alerts
|

Simple Artificial Neural Networks That Match Probability and Exploit and Explore When Confronting a Multiarmed Bandit

Abstract: Abstract-The matching law (Herrnstein 1961) states that response rates become proportional to reinforcement rates; this is related to the empirical phenomenon called probability matching (Vulkan 2000). Here, we show that a simple artificial neural network generates responses consistent with probability matching. This behavior was then used to create an operant procedure for network learning. We use the multiarmed bandit (Gittins 1989), a classic problem of choice behavior, to illustrate that operant training b… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
30
0

Year Published

2010
2010
2019
2019

Publication Types

Select...
6

Relationship

0
6

Authors

Journals

citations
Cited by 19 publications
(32 citation statements)
references
References 36 publications
2
30
0
Order By: Relevance
“…It is interesting to note that, ignoring features, the "correct" wall length configuration is reinforced on 50 % of its presentations (the reinforced location and its nonreinforced rotational equivalent), while the "incorrect" configurations present in any condition are reinforced 0 % of the time, and the perceptrons' responses converge to match these probabilities. The operant perceptron has already been established to match probabilities in classical choice-behavior tasks (Dawson et al, 2009); for it to exhibit this behavior in a reorientation context reinforces Miller and Shettleworth's (2007) conceptualization of reorientation as an operant task.…”
Section: Connection Weightsmentioning
confidence: 83%
See 4 more Smart Citations
“…It is interesting to note that, ignoring features, the "correct" wall length configuration is reinforced on 50 % of its presentations (the reinforced location and its nonreinforced rotational equivalent), while the "incorrect" configurations present in any condition are reinforced 0 % of the time, and the perceptrons' responses converge to match these probabilities. The operant perceptron has already been established to match probabilities in classical choice-behavior tasks (Dawson et al, 2009); for it to exhibit this behavior in a reorientation context reinforces Miller and Shettleworth's (2007) conceptualization of reorientation as an operant task.…”
Section: Connection Weightsmentioning
confidence: 83%
“…While we could extend the operant perceptron to see whether it fits the data from some of the novel tasks (i.e., regular octagons; Newcombe et al, 2010) in a manner similar to a more standard perceptron (Dawson, Kelly, et al, 2010), the synthetic approach would be to generate totally new predictions inspired by our findings. In this case, we might try nonuniform octagons (an arena type not yet investigated), or we might note other successes of the operant perceptron altogether, such as superconditioning (Dupuis & Dawson, in press) or probability matching (Dawson et al, 2009), and branch out beyond reorientation into completely new paradigms. Lewandowsky (1993) observed that computer modeling had its benefits, if done with care.…”
Section: Discussionmentioning
confidence: 99%
See 3 more Smart Citations