Image-based reconstruction is devoted to recovering the 3D point cloud models of target objects from scene images photographed at different viewpoints, and the existing methods often produce a large number of redundant background points, which causes inconvenience to 3D modeling or other related applications. To solve this issue, this work proposes an improved framework that combines image segmentation in the point cloud retrieving procedure, so as it only reconstructs the objects of interest in a scene. This framework provides two options for foreground object segmentation, and users can determine the appropriate method to obtain accurate segmentation for different scenes. Then, the feature matches are extracted from the segmented images, and the point cloud model is recovered via two phases of dense diffusion, feature diffusion and patch diffusion. In the diffusion stage, we introduce a new normalized metric that deals with both the illumination change and low texture case to enhance the robustness of the reconstruction. The experimental results show that proposed framework can effectively avoid reconstructing the irrelevant background data while outputting more even and detailed point cloud models.