Using sequences of movement dependency graphs to form object categories

Griffith, Shane; Sukhoy, Vladimir; Stoytchev, Alexander

doi:10.1109/humanoids.2011.6100910

Cited by 5 publications

(4 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The co-visibility of two viewpoints is calculated using the G -statistic, a method from statistical analysis, which has been applied in robotics to, e.g., measure co-movement in Griffith et al (2011), as…”

Section: Reprojection Flowmentioning

confidence: 99%

Transforming multiple visual surveys of a natural environment into time-lapses

Griffith

Dellaert

Pradalier

2019

The International Journal of Robotics Research

Self Cite

View full text Add to dashboard Cite

This article presents a new framework to help transform visual surveys of a natural environment into time-lapses. As data association across year-long variation in appearance continues to represent a formidable challenge, we present success with a map-centric approach, which builds on 3D vision for visual data association. We use a foundation of map point priors and geometric constraints within a dense correspondence image alignment optimization to align images and acquire loop closures between surveys. This framework produces many loop closures between sessions. Outlier loop closures are filtered in the frontend and in the backend to improve robustness. From the result map, the Reprojection Flow algorithm is applied to create time-lapses. The evaluation of our framework on the Symphony Lake Dataset, which has considerable variation in appearance, led to year-long time-lapses of many different scenes. In comparison with another approach based on using iterative closest point (ICP) plus a homography, our framework produced more and better-quality alignments. With many scenes of the 1.3 km environment consistently aligning well in random image pairs, we next produced 100 time-lapses across 37 surveys captured in a year. Approximately one-third had at least 20 (out of usually 33) well-aligned images, which spanned all four seasons. With promising results, we evaluated the pose error of misaligned image pairs and found that improving map consistency could lead to even better results.

show abstract

Section: Reprojection Flowmentioning

confidence: 99%

Transforming multiple visual surveys of a natural environment into time-lapses

Griffith

Dellaert

Pradalier

2019

The International Journal of Robotics Research

Self Cite

View full text Add to dashboard Cite

show abstract

“…categories like hard, soft,...) to objects. And in [24], the evolution of the visual motion of objects during robot actions are analysed to classify objects into two categories as a container/non container. All these approaches take advantage of the behaviour of the object during or after manipulations, and therefore they are not applicable in the scenario based on observation that we use as a first stage in this paper.…”

Section: Interactive Learningmentioning

confidence: 99%

From passive to interactive object learning and recognition through self-identification on a humanoid robot

2015

View full text Add to dashboard Cite

Service robots, working in evolving human environments, need the ability to continuously learn to recognize new objects. Ideally, they should act as humans do, by observing their environment and interacting with objects, without specific supervision. Taking inspiration from infant development, we propose a developmental approach that enables a robot to progressively learn objects appearances in a social environment: first, only through observation, then through active object manipulation. We focus on incremental, continuous, and unsupervised learning that does not require prior knowledge about the environment or the robot. In the first phase, we analyse the visual space and detect protoobjects as units of attention that are learned and recognized as possible physical entities. The appearance of each entity is represented as a multi-view model based on complementary visual features. In the second phase, entities are classified into three categories: parts of the body of the robot, parts of a human partner, and manipulable objects. The categorization approach is based on mutual information between the visual and proprioceptive data, and on motion behaviour of entities. The ability to categorize entities is then used during interactive object exploration to improve the previ

show abstract

“…Scene graphs and event chains, introduced first in this thesis, have also been interpreted and extended in different contexts (Luo et al, 2011;Griffith et al, 2011). In Luo et al (2011), the authors built a kernel-based vectorial representation of event chains, which makes SECs more compatible with machine learning techniques.…”

Section: Related Approachesmentioning

confidence: 99%

“…The kernelbased approach indicated improvements in the manipulation classification phase, however, object categorization and execution were not discussed in their framework. Griffith et al (2011) used scene graphs to analyze co-movement relationships between the robot arm (manipulator) and (manipulated) objects. The graph nodes represent the tracked features of manipulator and manipulated objects.…”

Section: Related Approachesmentioning

confidence: 99%

Semantic analysis of image sequences using computer vision methods

Erdal¹

View full text Add to dashboard Cite

Observing, learning, and imitating human skills are intriguing topics in cognitive robotics. The main problem in the imitation learning paradigm is the policy development. Policy can be defined as a mapping from an agent's current world state to actions. Thus, understanding and performing an observed human skill for a cognitive agent depends heavily upon the learned policy. So far, naive policies that use object and hand models with trajectory information have commonly been developed to encode and imitate various types of human manipulations. These approaches, on the one hand, can not be general enough since models are not learned by the agent itself but rather are provided by the designer in advance. It is also not sufficient to imitate complicated manipulations at the trajectory-level since even the same observed manipulation can have high variations in trajectories from demonstration to demonstration.Nevertheless, humans have the capability of recognizing and imitating observed manipulations without any problem. In humans, the chain of perception, learning, and imitation of manipulations is developed in conjunction with the interpretation of the manipulated objects. To compose a human-like perception-action chain the cognitive agent needs a generic policy that can extract manipulation primitives as well as the essential (invariant) relations between objects and manipulation actions.In this thesis, we introduce a novel concept, the so-called "Semantic Event Chain" (SEC), that derives the semantic essence and the invariant spatiotemporal relations of objects and actions to acquire a perception-action chain. We show that SECs are compact and generic encoding schemes for recognizing, learning, and executing human manipulations by relating them with manipulated objects. SECs basically make use of image sequences converted into uniquely trackable segments. The framework first interprets the scene as undirected and unweighted graphs, nodes and edges of which represent image segments and their spatial relations (e.g. touching or not-touching), respectively. Graphs hence become semantic representation of segments, i.e. objects (including hand) presented in the scene, in the space-time domain. The proposed framework then discretizes the entire graph sequence by extracting only main graphs each of which represents essential primitives of the manipulation. All extracted main graphs form the core skeleton of the SEC which is a sequence table, where the columns and rows correspond to main graphs and spatial relational changes between each object pair in the scene, respectively. SECs consequently extract only the naked spatiotemporal patterns which are basically "essence of an action" and are invariant to the followed trajectory, manipulation speed, or relative object poses.In the perception phase, SECs let a cognitive agent not only recognize and classify different observed manipulations but also categorize the manipulated objects considering their roles exhibited in the manipulations.

show abstract

Using sequences of movement dependency graphs to form object categories

Cited by 5 publications

References 13 publications

Transforming multiple visual surveys of a natural environment into time-lapses

Transforming multiple visual surveys of a natural environment into time-lapses

From passive to interactive object learning and recognition through self-identification on a humanoid robot

Semantic analysis of image sequences using computer vision methods

Contact Info

Product

Resources

About