The squat is a multi-joint exercise widely used for everyday at-home fitness. Focusing on the fine-grained classification of squat motions, we propose a smartwatch-based wearable system that can recognize subtle motion differences. For data collection, 52 participants were asked to perform one correct squat and five incorrect squats with three different arm postures (straight arm, crossed arm, and hands on waist). We utilized deep neural network-based models and adopted a conventional machine learning method (random forest) as a baseline. Experimental results revealed that the bidirectional GRU/LSTMs with an attention mechanism and the arm posture of hands on waist achieved the best test accuracy (F1-score) of 0.854 (0.856). High-dimensional embeddings in the latent space learned by attention-based models exhibit more clustered distributions than those by other DNN models, indicating that attention-based models learned features from the complex multivariate time-series motion signals more efficiently. To understand the underlying decision-making process of the machine-learning system, we analyzed the result of attention-based RNN models. The bidirectional GRU/LSTMs show a consistent pattern of attention for defined squat classes, but these models weigh the attention to the different kinematic events of the squat motion (e.g., descending and ascending). However, there was no significant difference found in classification performance.