In rehabilitation medicine, real-time analysis of the gait for human wearing lower-limb exoskeleton rehabilitation robot during walking can effectively prevent patients from experiencing excessive and asymmetric gait during rehabilitation training, thereby avoiding falls or even secondary injuries. To address the above situation, we propose a gait detection method based on computer vision for the real-time monitoring of gait during human–machine integrated walking. Specifically, we design a neural network model called GaitPoseNet, which is used for posture recognition in human–machine integrated walking. Using RGB images as input and depth features as output, regression of joint coordinates through depth estimation of implicit supervised networks. In addition, joint guidance strategy (JGS) is designed in the network framework. The degree of correlation between the various joints of the human body is used as a detection target to effectively overcome prediction difficulties due to partial joint occlusion during walking. Finally, a post processing algorithm is designed to describe patients’ walking motion by combining the pixel coordinates of each joint point and leg length. Our advantage is that we provide a non-contact measurement method with strong universality, and use depth estimation and JGS to improve measurement accuracy. Conducting experiments on the Walking Pose with Exoskeleton (WPE) Dataset shows that our method can reach 95.77% PCKs@0.1, 93.14% PCKs@0.08 and 3.55 ms runtime. Therefore our method achieves advanced performance considering both speed and accuracy.