In smart cities, a large amount of optical camera equipment is deployed and used. Closed-circuit television (CCTV), unmanned aerial vehicles (UAVs), and smartphones are some examples of such equipment. However, additional information about these devices, such as 3D position, orientation information, and principal distance, is not provided. To solve this problem, the structured mobile mapping system point cloud was used in this study to investigate methods of estimating the principal point, position, and orientation of optical sensors without initial given values. The principal distance was calculated using two direct linear transformation (DLT) models and a perspective projection model. Methods for estimating position and orientation were discussed, and their stability was tested using real-world sensors. When the perspective projection model was used, the camera position and orientation were best estimated. The original DLT model had a significant error in the orientation estimation. The correlation between the DLT model parameters was thought to have influenced the estimation result. When the perspective projection model was used, the position and orientation errors were 0.80 m and 2.55°, respectively. However, when using a fixed-wing UAV, the estimated result was not properly produced owing to ground control point placement problems.