Automated object identification in three‐dimensional (3D) space is crucial for work zone safety, such as compliance with construction rules and preventing workplace injuries and deaths. However, it is greatly challenged by some factors like high‐quality detection, high‐quality instance segmentation, few engineering object datasets with masks, and accurate 3D object understanding due to scale variations and limited cues in the 3D world. Traditional hand‐crafted methods suffer from these challenges. Our key insight is to use 2D object detection, instance segmentation and camera vision to compute pseudo‐light detection and ranging (LiDAR) point cloud for 3D object identification. On the one hand, an enhanced feature pyramid network is proposed to extract more fine‐grained object features, and an improved cascade mask R‐CNN is applied to detect bounding boxes and masks for all 2D objects efficiently. Moreover, the AIM dataset for heavy equipment detection is augmented, and a new object class with the bounding box and mask is added. On the other hand, pseudo‐LiDAR point clouds of objects based on bounding boxes and masks are recovered from a monocular image by deep learning, automatic camera parameter estimation, vision‐based method, and space filter. Extensive experiments and analyses show that the new methodology can identify 3D objects and automatically analyze work zone safety. The proposed object detection model has achieved state‐of‐the‐art results on the AIM dataset and 97.2% in mean average precision for the augmented dataset. The collision detection model using pseudo‐LiDAR point cloud has obtained 95.99% in accuracy. The new model will serve as a baseline to support 3D object identification research for other 3D tasks.