During rehabilitation, many postoperative patients need to perform autonomous massage on time and on demand. Thus, this paper develops an individualized, intelligent, and independent rehabilitation training system for based on image feature deep learning model acupoint massage that excludes human factors. The system, which innovatively integrates massage gesture recognition with human pose recognition. It relies on the binocular depth camera Kinect DK and Google MediaPipe Holistic pipeline to collect the real-time image feature data on joints and gestures of the patient in autonomous massage. Then the system calculates the coordinates of each finger joint, and computes the human poses with VGG-16, a convolutional neural network (CNN); the calculated results are translated, and presented in a virtual reality (VR) model based on Unity 3D, aiming to guide the patient actions in autonomous massage. This is because the image feature of the gesture recognition and pose recognition is hindered, when the hand or the human is occluded by the body or other things, owing to the limited recognition range of the hardware. The experimental results show that, the proposed system could correctly recognize up to 84% of non-occluded gestures, and up to 93% of non-occluded poses; the system also exhibited a good real-time performance, a high operability, and a low cost. Facing the lack of medical staff, our system can effectively improve the life quality of patients.