We introduce and evaluate a novel camera pose estimation framework that uses the human head as a calibration object. The proposed method facilitates extrinsic calibration from 2D input images (NIR and/or RGB), while merely relying on the detected human head, without the need for depth information. The approach is applicable to single cameras or multi-camera networks. Our implementation uses a fine-tuned deep learning-based 2D human facial landmark detector to estimate the 3D human head pose by fitting a 3D head model to the detected 2D facial landmarks. Our work focuses on an evaluation of the proposed approach on real multi-camera recordings and synthetic renderings to determine the accuracy of the pose estimation results and their applicability. We assess the robustness of our method against different input parameters, such as varying relative camera positions, variations of head models, face occlusions (by masks, sun glasses, etc.), potential biases and variance among humans. Based on the experimental results, we expect our approach to be effective for numerous use cases including automotive attention monitoring, robotics, VR/AR and other scenarios where ease of handling outweighs accuracy.