Background Scene supervision is a major tool to make medical robots safer and more intuitive. The paper shows an approach to efficiently use 3D cameras within the surgical operating room to enable for safe human robot interaction and action perception. Additionally the presented approach aims to make 3D camera-based scene supervision more reliable and accurate. Methods A camera system composed of multiple Kinect and time-of-flight cameras has been designed, implemented and calibrated. Calibration and object detection as well as people tracking methods have been designed and evaluated. Results The camera system shows a good registration accuracy of 0.05 m. The tracking of humans is reliable and accurate and has been evaluated in an experimental setup using operating clothing. The robot detection shows an error of around 0.04 m. Conclusions The robustness and accuracy of the approach allow for an integration into modern operating room. The data output can be used directly for situation and workflow detection as well as collision avoidance.