The purpose of this study is to solve the problems of multiple targets, poor accuracy, and inability to obtain displacement information in motion capture. Based on fusion target positioning and inertial attitude sensing technology, Unity3D is employed to create 3D scenes and 3D human body models to read real-time raw data from inertial sensors. Furthermore, a gesture fusion algorithm is used to process the raw data in real time to generate a quaternion, and a human motion capture system is designed based on inertial sensors for the complete movement information recording of the capture target. Results demonstrate that the developed system can accurately capture multiple moving targets and provide a higher recognition rate, reaching 75%∼100%. The maximum error of the system adopting the fusion target positioning algorithm is 10 cm, a reduction of 71.24% compared with that not using the fusion algorithm. The movements of different body parts are analyzed through example data. The recognition efficiency of “wave,” “crossover,” “pick things up,” “walk,” and “squat down” is as high as 100%. Hence, the proposed multiperson motion capture system that combines target positioning and inertial attitude sensing technology can provide better performance. The results are of great significance to promote the development of industries such as animation, medical care, games, and sports training.