A visual system based on virtual and real video for the humanoid robot manipulation is presented. 3D virtual model of humanoid robot is established. It has the same aspect and freedom setup as the real robot. Multiple feedback from the robot are fused and used to express the real robot status as text and images. The system also forecasts the operation order and displays the simulation result. In the data fusion module, a least-squares algorithm is adopted to calculate the real-time position and attitude of the robot. Experiments demonstrate that the system can offer good telepresence and a preview of the operation order. In this paper, we also propose an adaptive Elastic Net method for edge linking of images from the robot cameras to understand the situation of the task. In the proposed method, an adaptive dynamic parameter strategy and a stochastic noise strategy are introduced into the Elastic Net, which enables the network to have superior ability for escaping from local minima and converge sooner to optimal or near-optimal solutions