Training a robot that engages with people is challenging, because it is expensive to involve people in a robot training process requiring numerous data samples. This paper presents 1) a human path prediction network (HPPN), which predicts a user's next movement based on sequentially sensed human response data and 2) an evolution strategy-based robot training method based on virtual human movements generated by the HPPN, which compensates for this sample inefficiency problem. We applied the proposed method to the training of a robotic guide for visually impaired people, which was designed to collect multimodal human response data and reflect such data in selecting the robot's actions. We collected 1,507 real-world episodes for training the HPPN and then generated over 100,000 virtual episodes for training the robot policy. User test results indicate that our trained robot accurately guides blindfolded participants along a goal path. In addition, by the designed reward to pursue both guidance accuracy and human comfort during the robot policy training process, our robot leads to improved smoothness in human motion while maintaining the accuracy of the guidance. This sample-efficient training method is expected to be widely applicable to all robots and computing machinery that physically interact with humans.