Depth in convolutional neural networks solves scene segmentation

Seijdel, Noor; Tsakmakidis, Nikos; Haan, Edward Hf de; Bohté, Sander M.; Scholte, H. Steven

doi:10.1101/2019.12.16.877753

Cited by 4 publications

(3 citation statements)

References 52 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…First, although feedforward networks can in principle implement any function [54], recurrent networks can implement certain functions more efficiently. Flexible grouping and segmentation is exactly the kind of function that may benefit from recurrent computations (see also [55]). For example, to determine which local elements should be grouped into a global object, it helps to compute the global object first.…”

Section: Discussionmentioning

confidence: 99%

Capsule networks as recurrent models of grouping and segmentation

et al. 2020

View full text Add to dashboard Cite

Classically, visual processing is described as a cascade of local feedforward computations. Feedforward Convolutional Neural Networks (ffCNNs) have shown how powerful such models can be. However, using visual crowding as a well-controlled challenge, we previously showed that no classic model of vision, including ffCNNs, can explain human global shape processing. Here, we show that Capsule Neural Networks (CapsNets), combining ffCNNs with recurrent grouping and segmentation, solve this challenge. We also show that ffCNNs and standard recurrent CNNs do not, suggesting that the grouping and segmentation capabilities of CapsNets are crucial. Furthermore, we provide psychophysical evidence that grouping and segmentation are implemented recurrently in humans, and show that Caps-Nets reproduce these results well. We discuss why recurrence seems needed to implement grouping and segmentation efficiently. Together, we provide mutually reinforcing psychophysical and computational evidence that a recurrent grouping and segmentation process is essential to understand the visual system and create better models that harness global shape computations.

show abstract

Section: Discussionmentioning

confidence: 99%

Capsule networks as recurrent models of grouping and segmentation

et al. 2020

View full text Add to dashboard Cite

show abstract

“…Recently, a multitude of studies have reconciled these seemingly inconsistent findings by indicating that recurrent processes might be employed adaptively, depending on the visual input: while feed-forward activity might suffice for simple scenes with isolated objects, more complex scenes or more challenging conditions (e.g. objects that are occluded or degraded), may need additional visual operations (‘routines’) requiring recurrent computations (Groen et al, 2018; Tang et al, 2018; Kar et al, 2019; Rajaei et al, 2019; Seijdel et al, 2020). For objects in isolation, or very simple scenes, rapid recognition may thus rely on a coarse and unsegmented feed-forward representation (Crouzet and Serre, 2011), while for more cluttered images recognition may require explicit encoding of spatial relationships between parts.…”

Section: Introductionmentioning

confidence: 99%

On the necessity of recurrent processing during object recognition: it depends on the need for scene segmentation

Seijdel

Loke

Klundert³

et al. 2020

Preprint

Self Cite

View full text Add to dashboard Cite

While feed-forward activity may suffice for recognizing objects in isolation, additional visual operations that aid object recognition might be needed for real-world scenes. One such additional operation is figure-ground segmentation; extracting the relevant features and locations of the target object while ignoring irrelevant features. In this study of 60 participants, we show objects on backgrounds of increasing complexity to investigate whether recurrent computations are increasingly important for segmenting objects from more complex backgrounds. Three lines of evidence show that recurrent processing is critical for recognition of objects embedded in complex scenes. First, behavioral results indicated a greater reduction in performance after masking objects presented on more complex backgrounds; with the degree of impairment increasing with increasing background complexity. Second, electroencephalography (EEG) measurements showed clear differences in the evoked response potentials (ERPs) between conditions around 200ms - a time point beyond feed-forward activity and object decoding based on the EEG signal indicated later decoding onsets for objects embedded in more complex backgrounds. Third, Deep Convolutional Neural Network performance confirmed this interpretation; feed-forward and less deep networks showed a higher degree of impairment in recognition for objects in complex backgrounds compared to recurrent and deeper networks. Together, these results support the notion that recurrent computations drive figure-ground segmentation of objects in complex scenes.

show abstract

“…Intriguingly, these networks not only parallel human performance on some object recognition tasks (VanRullen, 2017), but they also feature processing characteristics that bear a lot of resemblance to the visual ventral stream in primates (Eickenberg et al, 2017; Güçclü and van Gerven, 2015; Khaligh-Razavi and Kriegeskorte, 2014; Kubilius et al, 2018; Schrimpf et al, 2020; Yamins et al, 2014). Leveraging this link between neural processing and performance has already enhanced insight into the potential mechanisms underlying shape perception (Kubilius et al,2016), scene segmentation (Seijdel et al, 2020) and the role of recurrence during object recognition (Kar et al, 2019; Kietzmann et al, 2019b). DCNNs may thus provide a promising avenue for systematically investigating how different attention mechanisms may modulate neural processing and thereby, performance.…”

Section: Introductionmentioning

confidence: 99%

Leveraging spiking deep neural networks to understand the neural mechanisms underlying selective attention

Sörensen

Zambrano

Slagter

et al. 2020

Preprint

Self Cite

View full text Add to dashboard Cite

Spatial attention enhances sensory processing of goal-relevant information and improves perceptual sensitivity. The specific mechanisms linking neural changes to changes in performance are still contested. Here, we examine different attention mechanisms in spiking deep convolutional neural networks. We directly contrast effects of noise suppression (precision) and two different gain modulation mechanisms on performance on a visual search task with complex real-world images. Unlike standard artificial neurons, biological neurons have saturating activation functions, permitting implementation of attentional gain as gain on a neuron’s input or on its outgoing connection. We show that modulating the connection is most effective in selectively enhancing information processing by redistributing spiking activity, and by introducing additional task-relevant information, as shown by representational similarity analyses. Precision did not produce attentional effects in performance. Our results, which mirror empirical findings, show that it is possible to adjudicate between attention mechanisms using more biologically realistic models and natural stimuli.

show abstract

Depth in convolutional neural networks solves scene segmentation

Cited by 4 publications

References 52 publications

Capsule networks as recurrent models of grouping and segmentation

Capsule networks as recurrent models of grouping and segmentation

On the necessity of recurrent processing during object recognition: it depends on the need for scene segmentation

Leveraging spiking deep neural networks to understand the neural mechanisms underlying selective attention

Contact Info

Product

Resources

About