Self-supervised learning shows great potential in monocular depth estimation, using image sequences as the only source of supervision. Although people try to use the high-resolution image for depth estimation, the accuracy of prediction has not been significantly improved. In this work, we find the core reason comes from the inaccurate depth estimation in large gradient regions, making the bilinear interpolation error gradually disappear as the resolution increases. To obtain more accurate depth estimation in large gradient regions, it is necessary to obtain high-resolution features with spatial and semantic information. Therefore, we present an improved DepthNet, HR-Depth, with two effective strategies: (1) re-design the skip-connection in DepthNet to get better high-resolution features and (2) propose feature fusion Squeeze-and-Excitation(fSE) module to fuse feature more efficiently. Using Resnet-18 as the encoder, HR-Depth surpasses all previous state-of-the-art(SoTA) methods with the least parameters at both high and low resolution. Moreover, previous SoTA methods are based on fairly complex and deep networks with a mass of parameters which limits their real applications. Thus we also construct a lightweight network which uses MobileNetV3 as encoder. Experiments show that the lightweight network can perform on par with many large models like Monodepth2 at high-resolution with only20%parameters. All codes and models will be available at https://github.com/shawLyu/HR-Depth.
Natural human robot interaction based on the dynamic hand gesture is becoming a popular research topic in the past few years. The traditional dynamic gesture recognition methods are usually restricted by the factors of illumination condition, varying color and cluttered background. The recognition performance can be improved by using the hand-wearing devices but this is not a natural and barrier-free interaction. To overcome these shortcomings, the depth perception algorithm based on the Kinect depth sensor is introduced to carry out 3D hand tracking. We propose a novel start/end point detection method for segmenting the 3D hand gesture from the hand motion trajectory. Then Hidden Markov Models (HMMs) are implemented to model and classify the hand gesture sequences and the recognized gestures are converted to control commands for the interaction with the robot. Seven different hand gestures performed by two hands can sufficiently navigate the robot. Experiments show that the proposed dynamic hand gesture interaction system can work effectively in the complex environment and in real-time with an average recognition rate of 98.4%. And further experiments for the robot navigation also verify the robustness of our system.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.