2018
DOI: 10.1371/journal.pcbi.1006613
|View full text |Cite
|
Sign up to set email alerts
|

Deep convolutional networks do not classify based on global object shape

Abstract: Deep convolutional networks (DCNNs) are achieving previously unseen performance in object classification, raising questions about whether DCNNs operate similarly to human vision. In biological vision, shape is arguably the most important cue for recognition. We tested the role of shape information in DCNNs trained to recognize objects. In Experiment 1, we presented a trained DCNN with object silhouettes that preserved overall shape but were filled with surface texture taken from other objects. Shape cues appea… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

21
245
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 322 publications
(291 citation statements)
references
References 38 publications
21
245
0
Order By: Relevance
“…In another study 19 , the authors tested view-identity tuning on several CNN models to compare with their novel generative model (see below). Although they showed RSM results similar to ours (Figure 2), they incorporated a more sophisticated quantitative comparison with experimental data 15 23 . This implies that the goal of the face-processing system may not be merely classification.…”
Section: Discussionmentioning
confidence: 58%
See 1 more Smart Citation
“…In another study 19 , the authors tested view-identity tuning on several CNN models to compare with their novel generative model (see below). Although they showed RSM results similar to ours (Figure 2), they incorporated a more sophisticated quantitative comparison with experimental data 15 23 . This implies that the goal of the face-processing system may not be merely classification.…”
Section: Discussionmentioning
confidence: 58%
“…For the macaque face-processing network, our study here could not find a CNN layer corresponding to the intermediate stage (ML). In addition, some recent studies have pointed out potential representational discrepancies between CNN and the ventral stream from behavioral consideration 19,23,[27][28][29] . (See also the related discussion on the 'computational gap' below.)…”
Section: Discussionmentioning
confidence: 99%
“…In recent studies, Baker et al (2018) and Geirhos et al, (2018Geirhos et al, ( , 2019 reported that CNNs rely on local texture and shape features rather than global shape contours. For example, while keeping the contour intact, replacing the skin texture of a cat with that of an elephant resulted in Resnet-50 classifying the animal as an elephant (Geirhos et al, 2018).…”
Section: Discussionmentioning
confidence: 99%
“…Although CNNs are believed to explicitly represent object shapes in the higher layers (Kriegeskorte, 2015;LeCun et al, 2015;Kubilius et al, 2016), emerging evidence suggests that CNNs may largely use local texture patches to achieve successful object classification (Ballester & de Araujo, 2016, Gatys et al, 2017 or local rather than global shape contours for object recognition (Baker et al, 2018). In a recent demonstration, CNNs were found to be poor at classifying objects defined by silhouettes and edges, and when texture and shape cues were in conflict, classifying objects according to texture rather than shape cues (Geirhos et al, 2019; see also Baker et al, 2018). However, when Resnet-50 was trained with stylized ImageNet images in which the original texture of every single image was replaced with the style of a randomly chosen painting, object classification performance significantly improved, relied more on shape than texture cues, and became more robust to noise and image distortions (Geirhos et al, 2019).…”
Section: The Effect Of Training a Cnn On Original Vs Stylized Image-nmentioning
confidence: 99%
“…It might be that the CNN model in our study does not learn the same features that humans use to solve the task (also see Appendix "Kernels for first layer of CNN"). Baker et al (2018) find that convolutional neural networks trained on a large database of natural images (Russakovsky et al, 2015) use different image features than humans for object recognition. More specifically, Geirhos et al (2019) report that such convolutional neural network models mostly rely on texture to perform classifications, while humans rely more on object shape.…”
Section: Discussionmentioning
confidence: 99%