This paper proposes a factorization method that reconstructs camera motion and scene shape based on the matching of multiple images under the condition that the camera captures a perspective view. Starting from the affine projection camera model, the projection depth is iteratively estimated until the measurement matrix has rank 4. Then, the obtained measurement matrix is factorized to restore the three-dimensional information of the scene in the projection space. This approach eliminates noise sensitive processes, such as the calculation of the fundamental matrix, that are required in the factorization for the conventional perspective projection image, and a stable reconstruction is realized. Furthermore, the metric constraint in the conventional affine model is extended, and the metric constraint in the perspective projection condition is derived. It is shown that the reconstruction in Euclidean space is realized if the internal parameters of the camera are given. © 2000 Scripta Technica, Syst Comp Jpn, 31(13): 8795, 2000