We demonstrate the use of semantic object detections as robust features for Visual Teach and Repeat (VTR). Recent CNN-based object detectors are able to reliably detect objects of tens or hundreds of categories in video at frame rates. We show that such detections are repeatable enough to use as landmarks for VTR, without any low-level image features. Since object detections are highly invariant to lighting and surface appearance changes, our VTR can cope with global lighting changes and local movements of the landmark objects. In the teaching phase we build a series of compact scene descriptors: a list of detected object labels and their image-plane locations. In the repeating phase, we use Seq-SLAM-like relocalization to identify the most similar learned scene, then use a motion control algorithm based on the funnel lane theory to navigate the robot along the previously piloted trajectory.We evaluate the method on a commodity UAV, examining the robustness of the algorithm to new viewpoints, lighting conditions, and movements of landmark objects. The results suggest that semantic object features could be useful due to their invariance to superficial appearance changes compared to low-level image features.
Robot Interaction has always been a challenge in collaborative robotics. In tasks
comprising Inter-Robot Interaction, robot detection is very often needed. We
explore humanoid robots detection because, humanoid robots can be useful in many
scenarios, and everything from helping elderly people live in their own homes to
responding to disasters. Cameras are chosen because they are reach and cheap
sensors, and there are lots of mature two-dimensional (2D) and 3D computer
vision libraries which facilitate Image analysis. To tackle humanoid robot
detection effectively, we collected a data set of various humanoid robots with
different sizes in different environments. Afterward, we tested the well-known
cascade classifier in combination with several image descriptors like Histograms
of Oriented Gradients (HOG), Local Binary Patterns (LBP), etc. on this data set.
Among the feature sets, Haar-like has the highest accuracy, LBP the highest
recall, and HOG the highest precision. Considering Inter-Robot Interaction, it
is evident that false positives are less troublesome than false negatives, thus
LBP is more useful than the others.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.