Unlike ballistic arm movements such as reaching, the contribution of depth information to the performance of manual tracking movements is unclear. Thus, to understand how the brain handles information, we investigated how a required movement along the depth axis would affect behavioral tracking performance, postulating that it would be affected by the amount of depth movement. We designed a visually guided planar tracking task that requires movement on three planes with different depths: a fronto-parallel plane called ROT (0), a sagittal plane called ROT (90), and a plane rotated by 45° with respect to the sagittal plane called ROT (45). Fifteen participants performed a circular manual tracking task under binocular and monocular visions in a three-dimensional (3D) virtual reality space. As a result, under binocular vision, ROT (90), which required the largest depth movement among the tasks, showed the greatest error in 3D. Similarly, the errors (deviation from the target path) on the depth axis revealed significant differences among the tasks. Under monocular vision, significant differences in errors were observed only on the lateral axis. Moreover, we observed that the errors in the lateral and depth axes were proportional to the required movement on these axes under binocular vision and confirmed that the required depth movement under binocular vision determined depth error independent of the other axes. This finding implies that the brain may independently process binocular vision information on each axis. Meanwhile, the required depth movement under monocular vision was independent of performance along the depth axis, indicating an intractable behavior. Our findings highlight the importance of handling depth movement, especially when a virtual reality situation, involving tracking tasks, is generated.