Running Large-Scale Simulations on the Neurorobotics Platform to Understand Vision – The Case of Visual Crowding

Bornet, Alban; Kaiser, Jacques; Kröner, Alexander; Falotico, Egidio; Ambrosano, Alessandro; Cantero, Kepa; Herzog, Michael H.; Francis, Gregory

doi:10.3389/fnbot.2019.00033

Cited by 11 publications

(9 citation statements)

References 43 publications

(86 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Along the same lines, Manassi et al [ 28 ] showed that elements beyond Bouma’s window can have a strong impact on target discrimination, and that the configuration of elements in the whole visual field determines crowding strength (see also [ 26 , 27 ]). A similar extensive comparison of models showed, once again, that only models that could reproduce these results contained a dedicated grouping stage [ 15 ] (see also [ 16 , 43 , 48 ]). Moreover, Van der Burg et al [ 49 ] showed that crowding in dense displays does not depend on target eccentricity but only on the configuration of the nearest neighbours.…”

Section: Discussionmentioning

confidence: 86%

“…Indeed, without grouping and segmentation to “rescue” the target from the flankers, all elements within Bouma’s window would decrease performance in those models. Grouping and segmentation seem crucial to explain crowding in general [ 10 , 15 , 44 , 48 ]. Moreover, it is known that texture models and other models based on pooling do not reproduce human grouping and segmentation [ 15 , 16 , 43 , 52 , 53 ].…”

Section: Discussionmentioning

confidence: 99%

See 1 more Smart Citation

Shrinking Bouma’s window: How to model crowding in dense displays

et al. 2021

Self Cite

View full text Add to dashboard Cite

In crowding, perception of a target deteriorates in the presence of nearby flankers. Traditionally, it is thought that visual crowding obeys Bouma’s law, i.e., all elements within a certain distance interfere with the target, and that adding more elements always leads to stronger crowding. Crowding is predominantly studied using sparse displays (a target surrounded by a few flankers). However, many studies have shown that this approach leads to wrong conclusions about human vision. Van der Burg and colleagues proposed a paradigm to measure crowding in dense displays using genetic algorithms. Displays were selected and combined over several generations to maximize human performance. In contrast to Bouma’s law, only the target’s nearest neighbours affected performance. Here, we tested various models to explain these results. We used the same genetic algorithm, but instead of selecting displays based on human performance we selected displays based on the model’s outputs. We found that all models based on the traditional feedforward pooling framework of vision were unable to reproduce human behaviour. In contrast, all models involving a dedicated grouping stage explained the results successfully. We show how traditional models can be improved by adding a grouping stage.

show abstract

Section: Discussionmentioning

confidence: 86%

Section: Discussionmentioning

confidence: 99%

Shrinking Bouma’s window: How to model crowding in dense displays

et al. 2021

Self Cite

View full text Add to dashboard Cite

show abstract

“…Capsule networks and the Laminart model are two-stage models, in which elements are first parsed into different groups, and then interference occurs only within the groups. Capsule networks group elements on the basis of object-level routing by agreement (for details, see Doerig et al, 2020 ; Sabour et al, 2017 ), whereas the Laminart model groups elements on the basis of low-level features (for details, see Francis et al, 2017 ; Bornet et al, 2019 ). The TTM model is a one-stage model that pools many low-level features computed over pooling regions whose size grows with eccentricity (for details, see Rosenholtz et al, 2019 ).…”

Section: Resultsmentioning

confidence: 99%

“…We simulated the conditions of experiment 1 ( Figure 3 ) with Capsule Networks ( Doerig et al, 2020 , https://github.com/adriendoerig/Capsule-networks-as-recurrent-models-of-grouping-and-segmentation ), the Laminart model ( Doerig, Bornet, et al, 2019 , https://bitbucket.org/albornet/laminart/ ) and the texture tiling model (TTM; Rosenholtz et al, 2019 , https://dspace.mit.edu/handle/1721.1/121152 ). Capsule networks were trained to recognize Verniers, groups of squares, groups of horizontal bars, and groups of vertical bars presented in isolation (i.e., there were only flankers or the Vernier).…”

Section: Methodsmentioning

confidence: 99%

Dissecting (un)crowding

et al. 2021

Self Cite

View full text Add to dashboard Cite

In crowding, perception of a target deteriorates in the presence of nearby flankers. Surprisingly, perception can be rescued from crowding if additional flankers are added (uncrowding). Uncrowding is a major challenge for all classic models of crowding and vision in general, because the global configuration of the entire stimulus is crucial. However, it is unclear which characteristics of the configuration impact (un)crowding. Here, we systematically dissected flanker configurations and showed that (un)crowding cannot be easily explained by the effects of the sub-parts or low-level features of the stimulus configuration. Our modeling results suggest that (un)crowding requires global processing. These results are well in line with previous studies showing the importance of global aspects in crowding.

show abstract

“…Specifically, we propose that a flexible 275 recurrent grouping process determines which elements are grouped into an object. In the case with a dedicated recurrent grouping process, which is able to explain why (un)crowding occurs 280 (see also Bornet et al, 2019 the crucial benchmarks targeting principled computational processes. Here, using crowding, we 307 showed a fundamental difference in local vs. global processing between humans and ffCNNs, 308 and suggest that grouping and segmentation are promising additions to make deep neural 309 networks better models of vision.…”

mentioning

confidence: 99%

Crowding Reveals Fundamental Differences in Local vs. Global Processing in Humans and Machines

Doerig

Bornet

Choung

et al. 2019

Preprint

View full text Add to dashboard Cite

6Feedforward Convolutional Neural Networks (ffCNNs) have become state-of-the-art models both 7 in computer vision and neuroscience. However, human-like performance of ffCNNs does not 8 necessarily imply human-like computations. Previous studies have suggested that current ffCNNs 9 do not make use of global shape information. However, it is currently unclear whether this reflects 10 fundamental differences between ffCNN and human processing or is merely an artefact of how 11 ffCNNs are trained. Here, we use visual crowding as a well-controlled, specific probe to test global 12 shape computations. Our results provide evidence that ffCNNs cannot produce human-like global 13 shape computations for principled architectural reasons. We lay out approaches that may address 14 shortcomings of ffCNNs to provide better models of the human visual system. 15 16

show abstract

Running Large-Scale Simulations on the Neurorobotics Platform to Understand Vision – The Case of Visual Crowding

Cited by 11 publications

References 43 publications

Shrinking Bouma’s window: How to model crowding in dense displays

Shrinking Bouma’s window: How to model crowding in dense displays

Dissecting (un)crowding

Crowding Reveals Fundamental Differences in Local vs. Global Processing in Humans and Machines

Contact Info

Product

Resources

About