This article describes an approach for mobile robots to identify scenes in configurations of objects spread across dense environments. This identification is enabled by intertwining the robotic object search and the scene recognition on already detected objects. We proposed “Implicit Shape Model (ISM) trees” as a scene model to solve these two tasks together. This article presents novel algorithms for ISM trees to recognize scenes and predict object poses. For us, scenes are sets of objects, some of which are interrelated by 3D spatial relations. Yet, many false positives may occur when using single ISMs to recognize scenes. We developed ISM trees, which is a hierarchical model of multiple interconnected ISMs, to remedy this. In this article, we contribute a recognition algorithm that allows the use of these trees for recognizing scenes. ISM trees should be generated from human demonstrations of object configurations. Since a suitable algorithm was unavailable, we created an algorithm for generating ISM trees. In previous work, we integrated the object search and scene recognition into an active vision approach that we called “Active Scene Recognition”. An efficient algorithm was unavailable to make their integration using predicted object poses effective. Physical experiments in this article show that the new algorithm we have contributed overcomes this problem.