A Multiview Approach to Learning Articulated Motion Models

Daniele, Andrea F.; Howard, Thomas M.; Walter, Matthew R.

doi:10.1007/978-3-030-28619-4_30

Cited by 12 publications

(13 citation statements)

References 46 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Consequently, a plenty of research has focused on developing representations that can faciliate planning and reasoning for highly specific situated tasks. These representations vary significantly depending on the application, from two-dimensional costmaps (Elfes, 1987), volumetric 3D voxel representations (Hornung et al, 2013(Hornung et al, , 2010, primitive shape based object approximations (Miller et al, 2003;Huebner and Kragic, 2008) to more rich representations that model high level semantic properties (Galindo et al, 2005;Pronobis and Jensfelt, 2012), 6 DOF pose of the objects of interest (Hudson et al, 2012) or affordances between objects (Daniele et al, 2017). Since inferring exhaustively detailed world models is impractical, one solution is to design perception pipelines that infer task relevant world models (Eppner et al, 2016;Fallon et al, 2014).…”

Section: Introductionmentioning

confidence: 99%

Language-Guided Adaptive Perception for Efficient Grounded Communication with Robotic Manipulators in Cluttered Environments

Patki

Howard

2018

Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue

View full text Add to dashboard Cite

The utility of collaborative manipulators for shared tasks is highly dependent on the speed and accuracy of communication between the human and the robot. The run-time of recently developed probabilistic inference models for situated symbol grounding of natural language instructions depends on the complexity of the representation of the environment in which they reason. As we move towards more complex bi-directional interactions, tasks, and environments, we need intelligent perception models that can selectively infer precise pose, semantics, and affordances of the objects when inferring exhaustively detailed world models is inefficient and prohibits real-time interaction with these robots. In this paper we propose a model of language and perception for the problem of adapting the configuration of the robot perception pipeline for tasks where constructing exhaustively detailed models of the environment is inefficient and inconsequential for symbol grounding. We present experimental results from a synthetic corpus of natural language instructions for robot manipulation in example environments. The results demonstrate that by adapting perception we get significant gains in terms of run-time for perception and situated symbol grounding of the language instructions without a loss in the accuracy of the latter.

show abstract

Section: Introductionmentioning

confidence: 99%

Language-Guided Adaptive Perception for Efficient Grounded Communication with Robotic Manipulators in Cluttered Environments

Patki

Howard

2018

Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue

View full text Add to dashboard Cite

show abstract

“…Further approaches: Articulation motion models can be viewed as geometric constraints imposed on multiple rigid bodies. Such constraints can be learned from human demonstrations by leveraging different sensing modalities [13,[28][29][30][31]. Recently, Daniele et al [30] proposed a multimodal learning framework that incorporates both vision and natural language information for articulation model estimation.…”

Section: Related Workmentioning

confidence: 99%

Distributional Depth-Based Estimation of Object Articulation Models

Jain¹,

Giguere²,

Lioutikov³

et al. 2021

Preprint

View full text Add to dashboard Cite

We propose a method that efficiently learns distributions over articulation model parameters directly from depth images without the need to know articulation model categories a priori. By contrast, existing methods that learn articulation models from raw observations typically only predict point estimates of the model parameters, which are insufficient to guarantee the safe manipulation of articulated objects. Our core contributions include a novel representation for distributions over rigid body transformations and articulation model parameters based on screw theory, von Mises-Fisher distributions, and Stiefel manifolds. Combining these concepts allows for an efficient, mathematically sound representation that implicitly satisfies the constraints that rigid body transformations and articulations must adhere to. Leveraging this representation, we introduce a novel deep learning based approach, DUST-net, that performs category-independent articulation model estimation while also providing model uncertainties. We evaluate our approach on several benchmarking datasets and real-world objects and compare its performance with two current state-of-the-art methods. Our results demonstrate that DUST-net can successfully learn distributions over articulation models for novel objects across articulation model categories, which generate point estimates with better accuracy than state-of-the-art methods and effectively capture the uncertainty over predicted model parameters due to noisy inputs.

show abstract

“…For example, Schmidt et al [32,33] tracked deformable targets through probability inference, which requires the definition of a standard geometric structure. Daniele et al [34] combined natural motion language information and computer vision for the attitude estimation of deformable targets, but this method requires natural motion language description as an additional mode.…”

Section: Related Workmentioning

confidence: 99%

A Multi-Step CNN-Based Estimation of Aircraft Landing Gear Angles

et al. 2021

Sensors

View full text Add to dashboard Cite

This paper presents a method for measuring aircraft landing gear angles based on a monocular camera and the CAD aircraft model. Condition monitoring of the aircraft landing gear is a prerequisite for the safe landing of the aircraft. Traditional manual observation has an intense subjectivity. In recent years, target detection models dependent on deep learning and pose estimation methods relying on a single RGB image have made significant progress. Based on these advanced algorithms, this paper proposes a method for measuring the actual angles of landing gears in two-dimensional images. A single RGB image of an aircraft is inputted to the target detection module to obtain the key points of landing gears. The vector field network votes the key points of the fuselage after extraction and scale normalization of the pixels inside the aircraft prediction box. Knowing the pixel position of the key points and the constraints on the aircraft, the angle between the landing gear and fuselage plane can be calculated even without depth information. The vector field loss function is improved based on the distance between pixels and key points, and synthetic datasets of aircraft with different angle landing gears are created to verify the validity of the proposed algorithm. The experimental results show that the mean error of the proposed algorithm for the landing gears is less than 5 degrees on the light-varying dataset.

show abstract

A Multiview Approach to Learning Articulated Motion Models

Cited by 12 publications

References 46 publications

Language-Guided Adaptive Perception for Efficient Grounded Communication with Robotic Manipulators in Cluttered Environments

Language-Guided Adaptive Perception for Efficient Grounded Communication with Robotic Manipulators in Cluttered Environments

Distributional Depth-Based Estimation of Object Articulation Models

A Multi-Step CNN-Based Estimation of Aircraft Landing Gear Angles

Contact Info

Product

Resources

About