Natural scene classification is a fundamental challenge in computer vision. By far, the majority of studies have limited their scope to scenes from single image stills and thereby ignore potentially informative temporal cues. The current paper is concerned with determining the degree of performance gain in considering short videos for recognizing natural scenes. Towards this end, the impact of multiscale orientation measurements on scene classification is systematically investigated, as related to: (i) spatial appearance, (ii) temporal dynamics and (iii) joint spatial appearance and dynamics. These measurements in visual space, x-y, and spacetime, xy-t, are recovered by a bank of spatiotemporal oriented energy filters. In addition, a new data set is introduced that contains 420 image sequences spanning fourteen scene categories, with temporal scene information due to objects and surfaces decoupled from camera-induced ones. This data set is used to evaluate classification performance of the various orientation-related representations, as well as state-of-the-art alternatives. It is shown that a notable performance increase is realized by spatiotemporal approaches in comparison to purely spatial or purely temporal methods.
Abstract-Perception of transparent objects has been an open challenge in robotics despite advances in sensors and datadriven learning approaches. In this paper, we introduce a new approach that combines recent advances in learnt object detectors with perceptual grouping in 2D, and projective geometry of apparent contours in 3D. We train a state of the art structured edge detector on an annotated set of foreground glassware. We assume that we deal with surfaces of revolution (SOR) and apply perceptual symmetry grouping in a 2D spherical transformation of the image to obtain a 2D detection of the glassware object and a hypothesis about its 2D axis. Rather than stopping at a single view detection, we ultimately want to reconstruct the 3D shape of the object and its 3D pose to allow for a robot to grasp it. Using two views allows us to decouple the 3D axis localization from the shape estimation. We develop a parametrization that uniquely relates the shape reconstruction of SOR to given a set of contour points and tangents. Finally, we provide the first annotated dataset for 2D detection, 3D pose and 3D shape of glassware and we show results comparable to category-based detection and localization of opaque objects without any training on the object shape.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.