This work describes a virtual reality (VR) based robot teleoperation framework which relies on scene visualization from depth cameras and implements human-robot and human-scene interaction gestures. We suggest that mounting a camera on a slave robot's end-effector (an in-hand camera) allows the operator to achieve better visualization of the remote scene and improve task performance. We compared experimentally the operator's ability to understand the remote environment in different visualization modes: single external static camera, in-hand camera, in-hand and external static camera, in-hand camera with OctoMap occupancy mapping. The latter option provided the operator with a better understanding of the remote environment whilst requiring relatively small communication bandwidth. Consequently, we propose suitable grasping methods compatible with the VR based teleoperation with the in-hand camera. Video demonstration: https://youtu.be/3vZaEykMS_E.