Humans can obtain an unambiguous perception of depth and 3-dimensionality with one eye or when viewing a pictorial image of a 3-dimensional scene. However, the perception of depth when viewing a real scene with both eyes is qualitatively different: there is a vivid impression of tangible solid form and immersive negative space. This perceptual phenomenon, referred to as "stereopsis", has been among the central puzzles of perception since the time of da Vinci. After Wheatstone's invention of the stereoscope in 1838, stereopsis has conventionally been explained as a by-product of binocular vision or visual parallax. However, this explanation is challenged by the observation that the impression of stereopsis can be induced in single pictures under monocular viewing. Here I propose an alternative hypothesis that stereopsis is a qualitative visual experience related to the perception of egocentric spatial scale. Specifically, the primary phenomenal characteristic of stereopsis (the impression of 'real' separation in depth) is proposed to be linked to the precision with which egocentrically scaled depth (absolute depth) is derived. Since conscious awareness of this precision could help guide the planning of motor action, the hypothesis provides a functional account for the important phenomenal characteristic associated with stereopsis: the impression of interactability. By linking stereopsis to a generic perceptual attribute, rather than a specific cue, it provides a potentially more unified account of the variation of stereopsis in real scenes and pictures, and a basis for understanding why we can perceive depth in pictures despite conflicting visual signals. StereopsisVishwanath 3
A picture viewed from its center of projection generates the same retinal image as the original scene, so the viewer perceives the scene correctly. When a picture is viewed from other locations, the retinal image specifies a different scene, but we normally do not notice the changes. We investigated the mechanism underlying this perceptual invariance by studying the perceived shapes of pictured objects viewed from various locations. We also manipulated information about the orientation of the picture surface. When binocular information for surface orientation was available, perceived shape was nearly invariant across a wide range of viewing angles. By varying the projection angle and the position of a stimulus in the picture, we found that invariance is achieved through an estimate of local surface orientation, not from geometric information in the picture. We present a model that explains invariance and other phenomena (such as perceived distortions in wide-angle pictures).
The localization of spatially extended objects is thought to be based on the computation of a default reference position, such as the center of gravity. This position can serve as the goal point for a saccade, a locus for fixation, or the reference for perceptual localization. We compared perceptual and saccadic localization for non-convex shapes where the center of gravity (COG) was located outside the boundary of the shape and did not coincide with any prominent perceptual features. The landing positions of single saccades made to the shape, as well as the preferred loci for fixation, were near the center of gravity, although local features such as part boundaries were influential. Perceptual alignment positions were also close to the center of gravity, but showed configural effects that did not influence either saccades or fixation. Saccades made in a more naturalistic sequential scanning task landed near the center of gravity with a considerably higher degree of accuracy (mean error <4% of saccade size) and showed no effects of local features, constituent parts, or stimulus configuration. We conclude that perceptual and oculomotor localization is based on the computation of a precise central reference position, which coincides with the center of gravity in sequential scanning. The saliency of the center of gravity, relative to other prominent visual features, can depend on the specific localization task or the relative configuration of elements. Sequential scanning, the more natural of the saccadic tasks, may provide a better way to evaluate the "default" reference position for localization. The fact that the reference position used in both oculomotor and perceptual tasks fell outside the boundary of the shapes supports the importance of spatial pooling, in contrast to local features, in object localization.
With the widespread use of mobile phones and scanners to photograph and upload documents, the need for extracting the information trapped in unstructured document images such as retail receipts, insurance claim forms and financial invoices is becoming more acute. A major hurdle to this objective is that these images often contain information in the form of tables and extracting data from tabular sub-images presents a unique set of challenges. This includes accurate detection of the tabular region within an image, and subsequently detecting and extracting information from the rows and columns of the detected table. While some progress has been made in table detection, extracting the table contents is still a challenge since this involves more fine grained table structure(rows & columns) recognition. Prior approaches have attempted to solve the table detection and structure recognition problems independently using two separate models. In this paper, we propose TableNet: a novel end-toend deep learning model for both table detection and structure recognition. The model exploits the interdependence between the twin tasks of table detection and table structure recognition to segment out the table and column regions. This is followed by semantic rule-based row extraction from the identified tabular sub-regions. The proposed model and extraction approach was evaluated on the publicly available ICDAR 2013 and Marmot Table datasets obtaining state of the art results. Additionally, we demonstrate that feeding additional semantic features further improves model performance and that the model exhibits transfer learning across datasets. Another contribution of this paper is to provide additional table structure annotations for the Marmot data, which currently only has annotations for table detection.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.