HighlightsNeurally inspired deep neural networks (DNNs) have recently emerged as powerful computer algorithms tackling real-world tasks on which humans excel, such as object recognition, speech processing, and cognitive planning.
Objects belonging to different categories evoke reliably different fMRI activity patterns in human occipitotemporal cortex, with the most prominent distinction being that between animate and inanimate objects. An unresolved question is whether these categorical distinctions reflect category-associated visual properties of objects or whether they genuinely reflect object category. Here, we addressed this question by measuring fMRI responses to animate and inanimate objects that were closely matched for shape and low-level visual features. Univariate contrasts revealed animate- and inanimate-preferring regions in ventral and lateral temporal cortex even for individually matched object pairs (e.g., snake-rope). Using representational similarity analysis, we mapped out brain regions in which the pairwise dissimilarity of multivoxel activity patterns (neural dissimilarity) was predicted by the objects' pairwise visual dissimilarity and/or their categorical dissimilarity. Visual dissimilarity was measured as the time it took participants to find a unique target among identical distractors in three visual search experiments, where we separately quantified overall dissimilarity, outline dissimilarity, and texture dissimilarity. All three visual dissimilarity structures predicted neural dissimilarity in regions of visual cortex. Interestingly, these analyses revealed several clusters in which categorical dissimilarity predicted neural dissimilarity after regressing out visual dissimilarity. Together, these results suggest that the animate-inanimate organization of human visual cortex is not fully explained by differences in the characteristic shape or texture properties of animals and inanimate objects. Instead, representations of visual object properties and object category may coexist in more anterior parts of the visual system.
In natural vision, objects appear at typical locations, both with respect to visual space (e.g., an airplane in the upper part of a scene) and other objects (e.g., a lamp above a table ). Recent studies have shown that object vision is strongly adapted to such positional regularities. In this review we synthesize these developments, highlighting that adaptations to positional regularities facilitate object detection and recognition, and sharpen the representations of objects in visual cortex. These effects are pervasive across various types of high-level content. We posit that adaptations to real-world structure collectively support optimal usage of limited cortical processing resources. Taking positional regularities into account will thus be essential for understanding efficient object vision in the real world. Positional Regularities in Object VisionMany natural behaviors crucially depend on accurately perceiving objects in the environment. Consequently, understanding object vision has been a core endeavor in cognitive neuroscience for many years, and recent decades have yielded exciting insights into how the human visual system processes various types of objects [1][2][3][4][5]. By and large, these insights have come from studies investigating the processing of individual objects presented at arbitrary locations (usually at fixation). However, in natural vision many objects often appear in specific locations both with respect to visual space (e.g., airplanes in the sky) and relative to other objects (e.g., lamps above tables).Although it has already been well established that such real-world positional regularities furnish observers with cognitive strategies that support effective behaviors (e.g., by providing schemata for economical memory storage [6][7][8] and efficient attentional allocation during search [9-11]), more recent work has begun to investigate the influence of real-world structure on how we perceive and represent objects. A rapidly burgeoning literature now indicates that positional regularities affect basic perceptual analysis both in terms of neural responses in visual cortex (e.g., by shaping tuning properties of object-selective regions) and perceptual sensitivity in psychophysical tasks (e.g., by facilitating object recognition and detection). Intriguingly, the general relevance of these effects has now been demonstrated across a range of high-level visual domains, including everyday objects, faces and bodies, words, and even social interactions between people. Drawing from both the neuroimaging and behavioral literatures, in this review we synthesize recent findings across processing levels and visual domains, and discuss how their resulting insights improve our understanding of real-world object vision.
The human visual system can only represent a small subset of the many objects present in cluttered scenes at any given time, such that objects compete for representation. Despite these processing limitations, the detection of object categories in cluttered natural scenes is remarkably rapid. How does the brain efficiently select goal-relevant objects from cluttered scenes? In the present study, we used multivariate decoding of magneto-encephalography (MEG) data to track the neural representation of within-scene objects as a function of top-down attentional set. Participants detected categorical targets (cars or people) in natural scenes. The presence of these categories within a scene was decoded from MEG sensor patterns by training linear classifiers on differentiating cars and people in isolation and testing these classifiers on scenes containing one of the two categories. The presence of a specific category in a scene could be reliably decoded from MEG response patterns as early as 160 ms, despite substantial scene clutter and variation in the visual appearance of each category. Strikingly, we find that these early categorical representations fully depend on the match between visual input and top-down attentional set: only objects that matched the current attentional set were processed to the category level within the first 200 ms after scene onset. A sensor-space searchlight analysis revealed that this early attention bias was localized to lateral occipitotemporal cortex, reflecting top-down modulation of visual processing. These results show that attention quickly resolves competition between objects in cluttered natural scenes, allowing for the rapid neural representation of goal-relevant objects.
Previously several functional magnetic resonance imaging (fMRI) studies point toward the role of perceptual expectations in determining adaptation or repetition suppression (RS) in humans. These studies showed that the probability of repetitions of faces within a block influences the magnitude of adaptation in face-related areas of the human brain (Summerfield et al., 2008). However, a current macaque single-cell/local field potential (LFP) recording study using objects as stimuli found no evidence for the modulation of the neural response by the repetition probability in the inferior temporal cortex (Kaliukhovich and Vogels, 2010). Here we examined whether stimulus repetition probability affects fMRI repetition suppression for nonface object stimuli in the human brain. Subjects were exposed to either two identical [repetition trials (RTs)] or two different [alternation trials (ATs)] object stimuli. Both types of trials were presented blocks consisting of either 75% [repetition blocks (RBs)] or 25% [alternation blocks (ABs)] of RTs. We found strong RS, i.e., a lower signal for RTs compared to ATs, in the object sensitive lateral occipital cortex as well as in the face-sensitive occipital and fusiform face areas. More importantly, however, there was no significant difference in the magnitude of RS between RBs and ABs in each of the areas. This is in agreement with the previous monkey single-unit/LFP findings and suggests that RS in the case of nonface visual objects is not modulated by the repetition probability in humans. Our results imply that perceptual expectation effects vary for different visual stimulus categories.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.