Upper limb rehabilitation is an effective methodology to restore and improve the functionality of patients after multiple medical events, such as strokes, arthroscopic surgery, and breast cancer surgery. High-quality rehabilitation training can promote the independent living of patients, thus enhancing the quality of life and reducing the financial burden. Traditional training sessions have some limitations, including high expenses, low compliance, and inaccurate evaluations. This paper presents a novel approach to assist healthcare professionals in assessing the functionality of upper limbs based on multimodal sensing data and deep learning algorithms. There are five different types of sensing data employed in the proposed approach: accelerometer, angular velocity, device orientation, RGB image, and depth image data. In order to assess the accuracy of training actions, the presented approach applies two machine learning algorithms, which are the dynamic time warping-K-nearest neighbor (DTW-KNN) algorithm and the long short-term memory (LSTM) neural network. The experimental results show that multimodal sensing data can improve the modeling accuracy compared with unimodal sensing data. The LSTM model can achieve better accuracy (96.3%) than DTW-KNN (74.07%) with multimodal sensing data. Moreover, LSTM performs extremely efficiently in modeling high-dimensional sensing data.