To steer a soft robot precisely in an unconstructed environment with minimal collision remains an open challenge for soft robots. When the environments are unknown, prior motion planning for navigation may not always be available. This paper presents a novel Simto-Real method to guide a cable-driven soft robot in a static environment under the Simulation Open Framework Architecture (SOFA). The scenario aims to resemble one of the steps during a simplified transoral tracheal intubation process where a robotic endotracheal tube is guided to the upper trachea-larynx location by a flexible video-assisted endoscope/stylet. In SOFA, we employ the quadratic programming inverse solver to obtain collision-free motion strategies for the endoscope/stylet manipulation based on the robot model and encode the virtual eye-in-hand vision. Then, we associate the anatomical features recognized by the virtual vision and the joint space motion using a closedloop nonlinear autoregressive exogenous model (NARX) network. Afterward, we transfer the learned knowledge to the robot prototype, expecting it to navigate to the desired spot in a new phantom environment automatically based on its eye-in-hand vision only. Experiment results indicate that our soft robot can efficaciously navigate through the unstructured phantom to the desired spot with minimal collision motion according to what it has learned from the virtual environment. The results show that the average R-squared coefficient between the closed-loop NARXforecasted and SOFA-referenced robot's cable and prismatic joint space motion are 0.963 and 0.997, respectively. The eye-in-hand visions also demonstrate good alignment between the robot tip and the glottis.