The perspective transformation based on pinhole camera geometry is widely used in 3D computer vision. The image coordinates projected by the camera model for a given 3D point are in a 3-tuple, (su, sv, s), where s is a scale factor. The inhomogeneous image coordinates u and v can then be determined by simply dividing the first two elements with the scale factor. Although it is easy to compute a scale factor using a (3×4) camera matrix, the computed s does not correspond with the real physical value of the model; the z coordinate of the projected 3D point in the camera-centered coordinate system. In this paper, we propose a unique neural network structure and its learning algorithm to compute the scale factor of a 3D point. Since the proposed method can estimate the scale factor as the real z coordinate, further vision processing such as camera calibration can be performed efficiently using the value. In our computer simulation, the proposed neural network operated well with proving its validity.