This article presents an integrated vision-based guiding system for aerial manipulation. More specifically, a 4 DoF planar dexterous manipulator, with a stereo camera attached on the end-effector, is endowed to a multirotor aerial platform enabling active manipulation capabilities. The proposed novel approach combines a visual processing scheme for object detection and tracking, as well as a manipulator positioning for allowing the aerial platform to approach the surface of interaction efficiently. In the developed scheme, the object detection is based on correlation filters to track the target robustly, while the depth information, from the stereo camera on board the manipulator, is used to extract the centroid of the manipulated object, compute its relative configuration with respect to the UAV and align the end-effector properly with the grasping point. The effectiveness of the proposed scheme is demonstrated in multiple experimental trials and simulations, highlighting it's applicability towards autonomous aerial manipulation.
In human-robot collaboration, perception plays a major role in enabling the robot to understand the surrounding environment and the position of humans inside the working area, which represents a key element for an effective and safe collaboration. Human pose estimators based on skeletal models are among the most popular approaches to monitor the position of humans around the robot, but they do not take into account information such as the body volume, needed by the robot for effective collision avoidance. In this paper, we propose a novel 3D human representation derived from body parts segmentation which combines high-level semantic information (i.e., human body parts) and volume information. To compute such body parts segmentation, also known as human parsing in the literature, we propose a multi-view system based on a camera network. People body parts are segmented in the frames acquired by each camera, projected into 3D world coordinates, and then aggregated to build a 3D representation of the human that is robust to occlusions. A further step of 3D data filtering has been implemented to improve robustness to outliers and segmentation accuracy. The proposed multi-view human parsing approach was tested in a real environment and its performance measured in terms of global and class accuracy on a dedicated dataset, acquired to thoroughly test the system under various conditions. The experimental results demonstrated the performance improvements that can be achieved thanks to the proposed multi-view approach.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.