This paper presents a method for endoscope's autonomous positioning by a robotic endoscope holder for minimally invasive surgery. The method improves human-robot cooperation in robot-assisted surgery by allowing the endoscope holder to acknowledge the surgeon's view projection and navigate the camera without manual control. The real-time prediction of next desired camera location is estimated using segmented instrument's tip locations from endoscope video and surgeon's attention focus given by tracked virtual reality headset. To tackle the issue of real-time surgical instrument segmentation for more precise instrument tip localization, we propose the YOLOv3 and ResNet Combined Neural Network. The method showed an 86.6% IoU across MICCAI'17 Endovis datasets with 30 frames per second processing speed. The proposed pipeline was implemented in ROS on Ubuntu with visualization running under Windows operating system in Unity3D. The simulation demonstrates the robotic arm, endoscope, and surgical environment visualized in 3D in the virtual reality headset to provide a stable view of the endoscope and improve the surgeon's perception of the operating environment.INDEX TERMS autonomous robot control, artificial neural network, minimally invasive surgery, object segmentation, human-robot cooperation.