Recognizing manipulations performed by a human and the transfer and execution of this by a robot is a difficult problem. We address this in the current study by introducing a novel representation of the relations between objects at decisive time points during a manipulation. Thereby, we encode the essential changes in a visual scenery in a condensed way such that a robot can recognize and learn a manipulation without prior object knowledge. To achieve this we continuously track image segments in the video and construct a dynamic graph sequence. Topological transitions of those graphs occur whenever a spatial relation between some segments has changed in a discontinuous way and these moments are stored in a transition matrix called the semantic event chain (SEC). We demonstrate that these time points are highly descriptive for distinguishing different manipulations. Employing simple sub-string search algorithms, semantic event chains can be compared and type-similar manipulations can be recognized with high confidence. As the approach is generic, statistical learning can be used to find the archetypal SEC of a given manipulation class. The performance of the algorithm is demonstrated on a set of real videos showing hands manipulating various objects and performing different actions. In experiments with a robotic arm, we show that the SEC can be learned by observing human manipulations, transferred to a new scenario, and then reproduced by the machine.
Abstract-In this work we introduce a novel approach for detecting spatiotemporal object-action relations, leading to both, action recognition and object categorization. Semantic scene graphs are extracted from image sequences and used to find the characteristic main graphs of the action sequence via an exact graph-matching technique, thus providing an event table of the action scene, which allows extracting objectaction relations. The method is applied to several artificial and real action scenes containing limited context. The central novelty of this approach is that it is model free and needs a priori representation neither for objects nor actions. Essentially actions are recognized without requiring prior object knowledge and objects are categorized solely based on their exhibited role within an action sequence. Thus, this approach is grounded in the affordance principle, which has recently attracted much attention in robotics and provides a way forward for trial and error learning of object-action relations through repeated experimentation. It may therefore be useful for recognition and categorization tasks for example in imitation learning in developmental and cognitive robotics.
Supervision of long-lasting extensive botanic experiments is a promising robotic application that some recent technological advances have made feasible. Plant modelling for this application has strong demands, particularly in what concerns 3D information gathering and speed. This paper shows that Time-of-Flight (ToF) cameras achieve a good compromise between both demands, providing a suitable complement to color vision. A new method is proposed to segment plant images into their composite surface patches by combining hierarchical color segmentation with quadratic surface fitting using ToF depth data. Experimentation shows that the interpolated depth maps derived from the obtained surfaces fit well the original scenes. Moreover, candidate leaves to be approached by a measuring instrument are ranked, and then robot-mounted cameras move closer to them to validate their suitability to being sampled. Some ambiguities arising from leaves overlap or occlusions are cleared up in this way. The work is a proof-of-concept that dense color data combined with sparse depth as provided by a ToF camera yields a good enough 3D approximation for automated plant measuring at the high throughput imposed by the application.Peer ReviewedPostprint (published version
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.