Human motion capture technology, which leverages sensors to track the movement trajectories of key skeleton points, has been progressively transitioning from industrial applications to broader civilian applications in recent years. It finds extensive use in fields such as game development, digital human modeling, and sport science. However, the affordability of these sensors often compromises the accuracy of motion data. Low-cost motion capture methods often lead to errors in the captured motion data. We introduce a novel approach for human motion reconstruction and enhancement using spatio-temporal attention-based graph convolutional networks (ST-ATGCNs), which efficiently learn the human skeleton structure and the motion logic without requiring prior human kinematic knowledge. This method enables unsupervised motion data restoration and significantly reduces the costs associated with obtaining precise motion capture data. Our experiments, conducted on two extensive motion datasets and with real motion capture sensors such as the SONY mocopi, demonstrate the method’s effectiveness in enhancing the quality of low-precision motion capture data. The experiments indicate the ST-ATGCN’s potential to improve both the accessibility and accuracy of motion capture technology.