To tele-operate a robot, visual feedback is critical. However, communication channel latency can delay feedback to the point where the operator is impeded in performing his task. This work presents a vision-based "predictive display" system that compensates for visual delay. The approach is online and relatively uncalibrated, thus it has the advantage of being useful in unknown environments and many applications. From monocular eye-in-hand video, we incrementally compute a 3D graphics model of the robot site in real time using our new technique. The method exploits free-space/occlusion constraints on the scene to produce a physically consistent mesh. Novel vantage points are immediately rendered in response to the operator's control commands, without waiting for delayed video. We implement a full prototype tele-operation system where the operator controls, via a PHANTOM Omni device, a Barrett WAM robot mounted on a mobile Segway. Experiments with this setup validate the efficacy of the proposed approach. We demonstrate significant improvement in task completion time with predictive display on a real robot, while our previous related results were established only in simulation.