The more sensory channels are equipped in a virtual assembly system, the more real users feel in the whole system, however, most of the existing virtual assembly systems are based on the natural interaction method of one or two sensory channels. Thus, this paper proposes a novel virtual assembly system integrating multi-sensory channels, including gesture interaction, Chinese speech interaction, tactile interaction, 3D display and real-time display of real environment pictures in the virtual environment. To improve the operability of the virtual environment, we analyze the parallel virtual assembly sequence on the basis of two-hand interaction, and the assembly priority is prompted based on UI interface. For ease of operation, we present a method of viewpoint control based on gesture interaction and the coordinate threshold of spatial position. A hierarchical bounding box collision detection algorithm based on volume difference is proposed to improve the efficiency of collision feedback and collision avoidance. In addition, the power equipment models are exhibited in the virtual scene, as the exhibits of the virtual roaming process. Finally, to evaluate the training effect of this system, a comparative experiment is designed to compare the participants' experience and the effect of assembly training. The experimental results show that the virtual assembly system of natural interaction with multi-sensory channels is flexible and immersive. INDEX TERMS Virtual assembly, natural interaction, virtual reality, multisensory channel. I. INTRODUCTION