Liav Assif scite author profile

Discovering the visual features and representations used by the brain to recognize objects is a central problem in the study of vision. Recently, neural network models of visual object recognition, including biological and deep network models, have shown remarkable progress and have begun to rival human performance in some challenging tasks. These models are trained on image examples and learn to extract features and representations and to use them for categorization. It remains unclear, however, whether the representations and learning processes discovered by current models are similar to those used by the human visual system. Here we show, by introducing and using minimal recognizable images, that the human visual system uses features and processes that are not used by current models and that are critical for recognition. We found by psychophysical studies that at the level of minimal recognizable images a minute change in the image can have a drastic effect on recognition, thus identifying features that are critical for the task. Simulations then showed that current models cannot explain this sensitivity to precise feature configurations and, more generally, do not learn to recognize minimal images at a human level. The role of the features shown here is revealed uniquely at the minimal level, where the contribution of each feature is essential. A full understanding of the learning and use of such features will extend our understanding of visual recognition and its cortical mechanisms and will enhance the capacity of computational models to learn from visual experience and to deal with recognition and detailed image interpretation.T he human visual system makes highly effective use of limited information (1, 2). As shown below ( Fig. 1 and Figs. S1 and S2), it can recognize consistently subconfigurations that are severely reduced in size or resolution. Effective recognition of reduced configurations is desirable for dealing with image variability: Images of a given category are highly variable, making recognition difficult, but this variability is reduced at the level of recognizable but minimal subconfigurations (Fig. 1B). Minimal recognizable configurations (MIRCs) are useful for effective recognition, but, as shown below, they also are computationally challenging because each MIRC is nonredundant and therefore requires the effective use of all available information. We use them here as sensitive tools to identify fundamental limitations of existing models of visual recognition and directions for essential extensions.A MIRC is defined as an image patch that can be reliably recognized by human observers and which is minimal in that further reduction in either size or resolution makes the patch unrecognizable (below criterion) (Methods). To discover MIRCs, we conducted a large-scale psychophysical experiment for classification. We started from 10 greyscale images, each showing an object from a different class (Fig. S3), and tested a large hierarchy of patches at different positions and decreasing size and res...

show abstract

Full interpretation of minimal images

Ben-Yosef

Assif

Ullman

2018

Cognition

View full text Add to dashboard Cite

The goal in this work is to model the process of 'full interpretation' of object images, which is the ability to identify and localize all semantic features and parts that are recognized by human observers. The task is approached by dividing the interpretation of the complete object to the interpretation of multiple reduced but interpretable local regions. In such reduced regions, interpretation is simpler, since the number of semantic components is small, and the variability of possible configurations is low. We model the interpretation process by identifying primitive components and relations that play a useful role in local interpretation by humans. To identify useful components and relations used in the interpretation process, we consider the interpretation of 'minimal configurations': these are reduced local regions, which are minimal in the sense that further reduction renders them unrecognizable and uninterpretable. We show that such minimal interpretable images have useful properties, which we use to identify informative features and relations used for full interpretation. We describe our interpretation model, and show results of detailed interpretations of minimal configurations, produced automatically by the model. Finally, we discuss possible extensions and implications of full interpretation to difficult visual tasks, such as recognizing social interactions, which are beyond the scope of current models of visual recognition.

show abstract

When standard RANSAC is not enough: cross-media visual matching with hypothesis relevancy

Hassner

Assif

Wolf

2013

Machine Vision and Applications

View full text Add to dashboard Cite

Visual categorization of social interactions

et al. 2014

View full text Add to dashboard Cite

Prominent theories of action recognition suggest that during the recognition of actions the physical patterns of the action is associated with only one action interpretation (e.g., a person waving his arm is recognized as waving). In contrast to this view, studies examining the visual categorization of objects show that objects are recognized in multiple ways (e.g., a VW Beetle can be recognized as a car or a beetle) and that categorization performance is based on the visual and motor movement similarity between objects. Here, we studied whether we find evidence for multiple levels of categorization for social interactions (physical interactions with another person, e.g., handshakes). To do so, we compared visual categorization of objects and social interactions (Experiments 1 and 2) in a grouping task and assessed the usefulness of motor and visual cues (Experiments 3, 4, and 5) for object and social interaction categorization. Additionally, we measured recognition performance associated with recognizing objects and social interactions at different categorization levels (Experiment 6). We found that basic level object categories were associated with a clear recognition advantage compared to subordinate recognition but basic level social interaction categories provided only a little recognition advantage. Moreover, basic level object categories were more strongly associated with similar visual and motor cues than basic level social interaction categories. The results suggest that cognitive categories underlying the recognition of objects and social interactions are associated with different performances. These results are in line with the idea that the same action can be associated with several action interpretations (e.g., a person waving his arm can be recognized as waving or greeting)

show abstract

A model for full local image interpretation

Ben-Yosef¹,

Assif²,

Harari³

et al. 2021

Preprint

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Liav Assif

Atoms of recognition in human and computer vision

Full interpretation of minimal images

When standard RANSAC is not enough: cross-media visual matching with hypothesis relevancy

Visual categorization of social interactions

A model for full local image interpretation

Contact Info

Product

Resources

About