Automated scene interpretation has benefited from advances in machine learning, and restricted tasks, such as face detection, have been solved with sufficient accuracy for restricted settings. However, the performance of machines in providing rich semantic descriptions of natural scenes from digital images remains highly limited and hugely inferior to that of humans. Here we quantify this "semantic gap" in a particular setting: We compare the efficiency of human and machine learning in assigning an image to one of two categories determined by the spatial arrangement of constituent parts. The images are not real, but the category-defining rules reflect the compositional structure of real images and the type of "reasoning" that appears to be necessary for semantic parsing. Experiments demonstrate that human subjects grasp the separating principles from a handful of examples, whereas the error rates of computer programs fluctuate wildly and remain far behind that of humans even after exposure to thousands of examples. These observations lend support to current trends in computer vision such as integrating machine learning with parts-based modeling.abstract reasoning | human learning | pattern recognition I mage interpretation, effortless and instantaneous for people, remains a fundamental challenge for artificial intelligence. The goal is to build a "description machine" that automatically annotates a scene from image data, detecting and describing objects, relationships, and context. It is generally acknowledged that building such a machine is not possible with current methodology, at least when measuring success against human performance.Some well-circumscribed problems have been solved with sufficient speed and accuracy for real-world applications. Almost every digital camera on the market today carries a face detection algorithm that allows one to adjust the focus according to the presence of humans in the scene; and machine vision systems routinely recognize flaws in manufacturing, handwritten characters, and other visual patterns in controlled industrial settings.However, such cases usually involve a single quasi-rigid object or an arrangement of a few discernible parts and thus do not display many of the complications of full-scale "scene understanding." Moreover, achieving high accuracy usually requires intense "training" with gigantic amounts of data. Systems that attempt to deal with multiple object categories, high intraclass variability, occlusion, context, and unanticipated arrangements, all of which are easily handled by people, typically perform poorly. Such visual complexity seems to require a form of global reasoning that uncovers patterns and generates high-level hypotheses from local measurements and prior world knowledge.In order to go beyond general observation and speculation, we have designed a controlled experiment to measure the difference in performance between computer programs and human subjects. The Synthetic Visual Reasoning Test (SVRT) is a series of 23 classification problems involvi...
We investigated interactions between morphological complexity and grammaticality on electrophysiological markers of grammatical processing during reading. Our goal was to determine whether morphological complexity and stimulus grammaticality have independent or additive effects on the P600 event-related potential component. Participants read sentences that were either well-formed or grammatically ill-formed, in which the critical word was either morphologically simple or complex. Results revealed no effects of complexity for well-formed stimuli, but the P600 amplitude was significantly larger for morphologically complex ungrammatical stimuli than for morphologically simple ungrammatical stimuli. These findings suggest that some previous work may have inadequately characterized factors related to reanalysis during morphosyntactic processing. Our results show that morphological complexity by itself does not elicit P600 effects. However, in ungrammatical circumstances, overt morphology provides a more robust and reliable cue to morphosyntactic relationships than null affixation.
It has long been known that the control of attention in visual search depends both on voluntary, top-down deployment according to context-specific goals, and on involuntary, stimulus-driven capture based on the physical conspicuity of perceptual objects. Recent evidence suggests that pairing target stimuli with reward can modulate the voluntary deployment of attention, but there is little evidence that reward modulates the involuntary deployment of attention to task-irrelevant distractors. We report several experiments that investigate the role of reward learning on attentional control. Each experiment involved a training phase and a test phase. In the training phase, different colors were associated with different amounts of monetary reward. In the test phase, color was not task-relevant and participants searched for a shape singleton; in most experiments no reward was delivered in the test phase. We first show that attentional capture by physically salient distractors is magnified by a previous association with reward. In subsequent experiments we demonstrate that physically inconspicuous stimuli previously associated with reward capture attention persistently during extinction-even several days after training. Furthermore, vulnerability to attentional capture by high-value stimuli is negatively correlated across individuals with working memory capacity and positively correlated with trait impulsivity. An analysis of intertrial effects reveals that value-driven attentional capture is spatially specific. Finally, when reward is delivered at test contingent on the task-relevant shape feature, recent reward history modulates value-driven attentional capture by the irrelevant color feature. The influence of learned value on attention may provide a useful model of clinical syndromes characterized by similar failures of cognitive control, including addiction, attention-deficit/ hyperactivity disorder, and obesity. KeywordsAttentional capture; Reward; Incentive salience; Visual search Selective attention gates access to awareness. Attentional control therefore determines the contents of awareness and the starting point for almost any behavioral or cognitive actperceiving, remembering, learning, or behaving. Attentional control has long been a core
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.