While feed-forward activity may suffice for recognizing objects in isolation, additional visual operations that aid object recognition might be needed for real-world scenes. One such additional operation is figure-ground segmentation; extracting the relevant features and locations of the target object while ignoring irrelevant features. In this study of 60 participants, we show objects on backgrounds of increasing complexity to investigate whether recurrent computations are increasingly important for segmenting objects from more complex backgrounds. Three lines of evidence show that recurrent processing is critical for recognition of objects embedded in complex scenes. First, behavioral results indicated a greater reduction in performance after masking objects presented on more complex backgrounds; with the degree of impairment increasing with increasing background complexity. Second, electroencephalography (EEG) measurements showed clear differences in the evoked response potentials (ERPs) between conditions around 200ms - a time point beyond feed-forward activity and object decoding based on the EEG signal indicated later decoding onsets for objects embedded in more complex backgrounds. Third, Deep Convolutional Neural Network performance confirmed this interpretation; feed-forward and less deep networks showed a higher degree of impairment in recognition for objects in complex backgrounds compared to recurrent and deeper networks. Together, these results support the notion that recurrent computations drive figure-ground segmentation of objects in complex scenes.
Feedforward deep convolutional neural networks (DCNNs) are, under specific conditions, matching and even surpassing human performance in object recognition in natural scenes.This performance suggests that the analysis of a loose collection of image features could support the recognition of natural object categories, without dedicated systems to solve specific visual subtasks. Research in humans however suggests that while feedforward activity may suffice for sparse scenes with isolated objects, additional visual operations ('routines') that aid the recognition process (e.g. segmentation or grouping) are needed for more complex scenes. Linking human visual processing to performance of DCNNs with increasing depth, we here explored if, how, and when object information is differentiated from the backgrounds they appear on. To this end, we controlled the information in both objects and backgrounds, as well as the relationship between them by adding noise, manipulating background congruence and systematically occluding parts of the image.Results indicate that with an increase in network depth, there is an increase in the distinction between object-and background information. For more shallow networks, results indicated a benefit of training on segmented objects. Overall, these results indicate that, de facto, scene segmentation can be performed by a network of sufficient depth. We conclude that the human brain could perform scene segmentation in the context of object identification without an explicit mechanism, by selecting or "binding" features that belong to the object and ignoring other features, in a manner similar to a very deep convolutional neural network.
The classical notion of cognitive impenetrability suggests that perceptual processing is an automatic modular system and not under conscious control. Near consensus is now emerging that this classical notion is untenable. However, as recently pointed out by Firestone and Scholl, this consensus is built on quicksand. In most studies claiming perception is cognitively penetrable, it remains unclear which actual process has been affected (perception, memory, imagery, input selection or judgment). In fact, the only available “proofs” for cognitive penetrability are proxies for perception, such as behavioral responses and neural correlates. We suggest that one can interpret cognitive penetrability in two different ways, a broad sense and a narrow sense. In the broad sense, attention and memory are not considered as “just” pre- and post-perceptual systems but as part of the mechanisms by which top-down processes influence the actual percept. Although many studies have proven top-down influences in this broader sense, it is still debatable whether cognitive penetrability remains tenable in a narrow sense. The narrow sense states that cognitive penetrability only occurs when top-down factors are flexible and cause a clear illusion from a first person perspective. So far, there is no strong evidence from a first person perspective that visual illusions can indeed be driven by high-level flexible factors. One cannot be cognitively trained to see and unsee visual illusions. We argue that this lack of convincing proof for cognitive penetrability in the narrow sense can be explained by the fact that most research focuses on foveal vision only. This type of perception may be too unambiguous for transient high-level factors to control perception. Therefore, illusions in more ambiguous perception, such as peripheral vision, can offer a unique insight into the matter. They produce a clear subjective percept based on unclear, degraded visual input: the optimal basis to study narrowly defined cognitive penetrability.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.