We consider the problem of active victim segmentation during a search‐and‐rescue (SAR) exploration mission. The robot is equipped with a multimodal sensor suite consisting of a camera, lidar, and pan‐tilt thermal sensor. The robot enters an unknown scene, builds a 3D model incrementally, and the proposed method simultaneously (a) segments the victims from incomplete multimodal measurements and (b) controls the motion of the thermal camera. Both of these tasks are difficult due to the lack of natural training data and the limited number of real‐world trials. In particular, we overcome the absence of training data for the segmentation task by employing a manually designed generative model, which provides a semisynthetic training data set. The limited number of real‐world trials is tackled by self‐supervised initialization and optimization‐based guiding of the motion control learning. In addition to that, we provide a quantitative evaluation of the proposed method on several real testing scenarios using the real SAR robot. Finally, we also provide a data set which will allow for further development of algorithms on the real data.