Model-based stereo vision pose estimation depends on the establishment of the model. The photo-model-based method simplifies the model-building process with just one photo. Programming languages do not predefine the shapes, colors, and patterns of objects. In the past, however, it was necessary to calculate a pixel per metric ratio, that is, the number of pixels per millimeter of the object, based on the photo’s shooting distance to generate a photo-model with the same size (length and width) as the actual object. It restricts the real application. The proposed method extends the traditional photo-modeling algorithm and relaxes the photo prerequisite for target pose determination. Various pixel per metric ratios will be assumed to generate 3D photo-models of different sizes. These models will then be employed in stereo vision image matching techniques to detect the pose of the target object. Since it is not a data-driven method, it does not require many pictures and pretraining time. This article applies the algorithm to the cleaning of seaports and aquaculture, aiming to locate dead or diseased marine life on the water surface before collection. Pose estimation experiments have been conducted to detect an object’s pose and a prepared photo’s pixel per metric ratio in real application scenarios. The results show that the expanded photo-model stereo vision method can estimate the pose of a target with one pixel per metric ratio unknown photo.