High-accuracy and high-speed 3D sensing technology plays an essential role in VR eye tracking as it can build a bridge to connect the user with virtual worlds. In VR eye tracking, fringe projection profilometry (FPP) can avoid dependence on scene textures and provide accurate results in near-eye scenarios; however, phase-shifting based FPP faces challenges like motion artifacts and may not meet the low-latency requirements of eye tracking tasks. On the other hand, Fourier transform profilometry can achieve single-shot 3D sensing, but the system is highly impacted by the texture variations on the eye. As a solution to the challenges above, researchers have explored deep learning-based single-shot fringe projection 3D sensing techniques. However, building a training dataset is expensive, and without abundant data the model is difficult to make generalized. In this paper, we built a virtual fringe projection system along with photorealistic face and eye models to synthesize large amounts of training data. Therefore, we can reduce the cost and enhance the generalization ability of the convolutional neural network (CNN). The training data synthesizer utilizes physically based rendering (PBR) and achieves high photorealism. We demonstrate that PBR can simulate the complex double refraction of structured light due to corneas. To train the CNN, we adopted the idea of transfer learning, where the CNN is first trained by PBR-generated data, then trained with the real data. We tested the CNN on real data, and the predicted results demonstrate that the synthesized data enhances the performance of the model and achieves around 3.722 degree gaze accuracy and 0.5363 mm pupil position error on an unfamiliar participant.