To optimize visual search, humans attend to objects with the expected size of the sought target relative to its surrounding scene (object-scene scale consistency). We investigate how the human brain responds to variations in object-scene scale consistency. We use functional magnetic resonance imaging and a voxel-wise feature encoding model to estimate tuning to different object/scene properties. We find that regions involved in scene processing (transverse occipital sulcus) and spatial attention (intraparietal sulcus) have the strongest responsiveness and selectivity to object-scene scale consistency: reduced activity to mis-scaled objects (either unusually smaller or larger). The findings show how and where the brain incorporates object-scene size relationships in the processing of scenes. The response properties of these brain areas might explain why during visual search humans often miss objects that are salient but at atypical sizes relative to the surrounding scene.
Many animals and humans process the visual field with a varying spatial resolution (foveated vision) and use peripheral processing to make eye movements and point the fovea to acquire high-resolution information about objects of interest. This architecture results in computationally efficient rapid scene exploration. Recent progress in vision Transformers has brought about new alternatives to the traditionally convolution-reliant computer vision systems. However, these models do not explicitly model the foveated properties of the visual system nor the interaction between eye movements and the classification task. We propose foveated Transformer (FoveaTer) model, which uses pooling regions and saccadic movements to perform object classification tasks using a vision Transformer architecture. Our proposed model pools the image features using squared pooling regions, an approximation to the biologically-inspired foveated architecture, and uses the pooled features as an input to a Transformer Network. It decides on the following fixation location based on the attention assigned by the Transformer to various locations from previous and present fixations. The model uses a confidence threshold to stop scene exploration, allowing to dynamically allocate more fixation/computational resources to more challenging images. We construct an ensemble model using our proposed model and unfoveated model, achieving an accuracy 1.36% below the unfoveated model with 22% computational savings. Finally, we demonstrate our model's robustness against adversarial attacks, where it outperforms the unfoveated model.Preprint. Under review.
Game publishers and anti-cheat companies have been unsuccessful in blocking cheating in online gaming. We propose a novel, vision-based approach that captures the frame buffer's final state and detects illicit overlays. To this aim, we train and evaluate a DNN detector on a new dataset, collected using two first-person shooter games and three cheating software. We study the advantages and disadvantages of different DNN architectures operating on a local or global scale. We use output confidence analysis to avoid unreliable detections and inform when network retraining is required. In an ablation study, we show how to use Interval Bound Propagation (IBP) to build a detector that is also resistant to potential adversarial attacks and study IBP's interaction with confidence analysis. Our results show that robust and effective anti-cheating through machine learning is practically feasible and can be used to guarantee fair play in online gaming.
With the advent of powerful convolutional neural networks (CNNs), recent studies have extended early applications of neural networks to imaging tasks thus making CNNs a potential new tool for assessing medical image quality. Here, we compare a CNN to model observers in a search task for two possible signals (a simulated mass and a smaller simulated micro-calcification) embedded in filtered noise and single slices of Digital Breast Tomosynthesis (DBT) virtual phantoms. For the case of the filtered noise, we show how a CNN can approximate the ideal observer for a search task, achieving a statistical efficiency of 0.77 for the microcalcification and 0.78 for the mass. For search in single slices of DBT phantoms, we show that a Channelized Hotelling Observer (CHO) performance is affected detrimentally by false positives related to anatomic variations and results in detection accuracy below human observer performance. In contrast, the CNN learns to identify and discount the backgrounds, and achieves performance comparable to that of human observer and superior to model observers (Proportion Correct for the microcalcification: CNN = 0.96; Humans = 0.98; CHO = 0.84; Proportion Correct for the mass: CNN = 0.98; Humans = 0.83; CHO = 0.51). Together, our results provide an important evaluation of CNN methods by benchmarking their performance against human and model observers in complex search tasks.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.