Post-stroke care encounters challenges, including high cost, lack of professionals, and insufficient rehabilitation state evaluation. Computer technology can alleviate these issues, as it allows health care professionals (HCP) to quantify the workload and thus enhance rehabilitation care quality. In this paper, a novel multi-model fusion method, terms as pose dual-stream network (PDSN), is devised, aiming to test the feasibility of monitoring the training actions of rehabilitating stroke patients in care management. In particular, this deep-learning-based algorithm combines human pose estimation and dual-stream networks in an innovative way. We utilize an improved OpenPose to estimate human pose from videos obtained by the low-cost monocular camera. In dual-stream networks, the spatial and motion streams are flexibly integrated. The spatial stream network combines the Gated Recurrent Unit (GRU) and attention mechanism to extract spatiotemporal data, while the motion stream network is composed of improved multi-layer 1D Convolutional Neural Networks (CNN), which enhanced by causal and dilated convolution skillfully. Additionally, an adaptive weight fusion strategy is used to fuse the two networks for the final action classification. Results show high accuracy on two public datasets and a dataset created by us, which validate the superiority and feasibility of our method.