2021
DOI: 10.1177/0963721421990334
|View full text |Cite
|
Sign up to set email alerts
|

Learning About the World by Learning About Images

Abstract: One of the deepest insights in neuroscience is that sensory encoding should take advantage of statistical regularities. Humans’ visual experience contains many redundancies: Scenes mostly stay the same from moment to moment, and nearby image locations usually have similar colors. A visual system that knows which regularities shape natural images can exploit them to encode scenes compactly or guess what will happen next. Although these principles have been appreciated for more than 60 years, until recently it h… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
12
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
7

Relationship

1
6

Authors

Journals

citations
Cited by 16 publications
(12 citation statements)
references
References 24 publications
0
12
0
Order By: Relevance
“…It is perhaps not entirely surprising that the human visual system is not optimized for distinguishing mirror from glass, given their infrequency in the natural environment. However, despite many attempts to understand human perceptual processes through normative ideal observer models ( Burge, 2020 ; Geisler, 2011 ), we suggest that in general, many aspects of material appearance, and perhaps perception more generally, might be better understood as fulfilling objective functions other than optimal estimation of specific distal physical properties ( Fleming & Storrs, 2019 ; Storrs et al, 2021 ; Storrs & Fleming, 2021 ).…”
Section: Discussionmentioning
confidence: 86%
See 1 more Smart Citation
“…It is perhaps not entirely surprising that the human visual system is not optimized for distinguishing mirror from glass, given their infrequency in the natural environment. However, despite many attempts to understand human perceptual processes through normative ideal observer models ( Burge, 2020 ; Geisler, 2011 ), we suggest that in general, many aspects of material appearance, and perhaps perception more generally, might be better understood as fulfilling objective functions other than optimal estimation of specific distal physical properties ( Fleming & Storrs, 2019 ; Storrs et al, 2021 ; Storrs & Fleming, 2021 ).…”
Section: Discussionmentioning
confidence: 86%
“…Here—as in almost all neural network-based putative models of human vision ( Kriegeskorte, 2015 ; Majaj & Pelli, 2018 ; Yamins & DiCarlo, 2016 )—we used supervised learning, in which the network is trained on hundreds of thousands of accurately labelled images. Human vision cannot be trained this way, because labelled data are rare ( Fleming & Storrs, 2019 ; Storrs & Fleming, 2021 ), and the scale of the training set almost certainly exceeds human visual experience with mirror and glass objects. In particular, we very rarely get to see mirror and glass versions of the same objects, and we presumably also exploit the fact that vision unfolds continuously over time, rather than in independent static snapshots, as CNNs are typically trained ( Karpathy et al, 2014 ; van Assen, Nishida, & Fleming, 2020 ).…”
Section: Discussionmentioning
confidence: 99%
“…This learning process is therefore unsupervised, since it does not require explicit access to the ground truth. Although the existence of this unsupervised learning process is a conjecture that needs to be verified through machine learning algorithms, there is already evidence that unsupervised learning can be very effective in spontaneously clustering images according to distal properties such as reflectance and illumination [67][68][69]. The Vector Sum model is also more parsimonious than the MLE model since (i) it assumes a linear input-output mapping instead of an accurate or veridical mapping, which can account for the biases observed in depth estimation, and (ii) it does not require an estimate of the output variability of individual modules.…”
Section: Introductionmentioning
confidence: 99%
“…Some loss functions add in Lambertian terms ( Tang et al, 2012 ) (or other physical rendering models; Wu et al, 2015 ). Others argue that these models amount to latent variables and should be learned ( Storrs & Fleming, 2021 ), introducing constraint through selected training data. Examples include, for example, “faces” ( Sengupta et al, 2018 ) or “chairs” or “dormitory rooms” ( Kulkarni et al, 2015 ); review in Breuß et al, 2021 In any case, the DNN architecture is tuned to interpolate the given data, so that the resulting algorithms can be brittle outside of it.…”
Section: Introductionmentioning
confidence: 99%
“…The danger, to put it more generally, is that such networks learn to approximate the inverse to the data generator. To paraphrase Storrs ( Storrs & Fleming, 2021 ), to “learn about the world by learning about images” depends on which images are chosen . Elliptical patches were a special case where a few curvature parameters specified the solution.…”
Section: Introductionmentioning
confidence: 99%