Technological advancements and increased access have prompted the adoption of head- mounted display based virtual reality (VR) for neuroscientific research, manual skill training, and neurological rehabilitation. Applications that focus on manual interaction within the virtual environment (VE), especially haptic-free VR, critically depend on virtual hand-object collision detection. Knowledge about how multisensory integration related to hand-object collisions affects perception-action dynamics and reach-to-grasp coordination is needed to enhance the immersiveness of interactive VR. Here, we explored whether and to what extent sensory substitution for haptic feedback of hand-object collision (visual, audio, or audiovisual) and collider size (size of spherical pointers representing the fingertips) influences reach-to-grasp kinematics. In Study 1, visual, auditory, or combined feedback were compared as sensory substitutes to indicate the successful grasp of a virtual object during reach-to-grasp actions. In Study 2, participants reached to grasp virtual objects using spherical colliders of different diameters to test if virtual collider size impacts reach-to-grasp. Our data indicate that collider size but not sensory feedback modality significantly affected the kinematics of grasping. Larger colliders led to a smaller size-normalized peak aperture. We discuss this finding in the context of a possible influence of spherical collider size on the perception of the virtual object’s size and hence effects on motor planning of reach-to-grasp. Critically, reach-to-grasp spatiotemporal coordination patterns were robust to manipulations of sensory feedback modality and spherical collider size, suggesting that the nervous system adjusted the reach (transport) component commensurately to the changes in the grasp (aperture) component. These results have important implications for research, commercial, industrial, and clinical applications of VR.