This paper proposes an approach for training visuo-haptic object recognition models for robots using synthetic datasets generated by 3D virtual simulations. In robotics, where visual object recognition has witnessed considerable progress due to an abundance of image datasets, the scarcity of diverse haptic samples has resulted in a noticeable gap in research on machine learning incorporating the haptic sense. Our proposed methodology addresses this challenge by utilizing 3D virtual simulations to create realistic synthetic datasets, offering a scalable and cost-effective solution to integrate haptic and visual cues for object recognition seamlessly. Acknowledging the importance of multimodal perception, particularly in robotic applications, our research not only closes the existing gap but envisions a future where intelligent agents possess a holistic understanding of their environment derived from both visual and haptic senses. Our experiments show that synthetic datasets can be used for training object recognition in haptic and visual modes by incorporating noise, performing some preprocessing, data augmentation, or domain adaptation. This work contributes to the advancement of multimodal machine learning towards a more nuanced and comprehensive robotic perception.