Tower cranes can cover most of the area of a construction site, which brings significant safety risks, including potential collisions with other entities. To address these issues, it is necessary to obtain accurate and real-time information on the orientation and location of tower cranes and hooks. As a non-invasive sensing method, computer vision-based (CVB) technology is widely applied on construction sites for object detection and three-dimensional (3D) localization. However, most existing methods mainly address the localization on the construction ground plane or rely on specific viewpoints and positions. To address these issues, this study proposes a framework for the real-time recognition and localization of tower cranes and hooks using monocular far-field cameras. The framework consists of four steps: far-field camera autocalibration using feature matching and horizon-line detection, deep learning-based segmentation of tower cranes, geometric feature reconstruction of tower cranes, and 3D localization estimation. The pose estimation of tower cranes using monocular far-field cameras with arbitrary views is the main contribution of this paper. To evaluate the proposed framework, a series of comprehensive experiments were conducted on construction sites in different scenarios and compared with ground-truth data obtained by sensors. The experimental results show that the proposed framework achieves high precision in both crane jib orientation estimation and hook position estimation, thereby contributing to the development of safety management and productivity analysis.