Aging infrastructure has drawn increased attention globally, as its collapse would be destructive economically and socially. Precise quantification of minor defects is essential for identifying issues before structural failure occurs. Most studies measured the dimension of defects at image level, ignoring the third-dimensional information available from close-range photogrammetry. This paper aims to develop an efficient approach to accurately detecting and quantifying minor defects on complicated infrastructures. Pixel sizes of inspection images are estimated using spatial information generated from three-dimensional (3D) point cloud reconstruction. The key contribution of this research is to obtain the actual pixel size within the grided small sections by relating spatial information. To automate the process, deep learning technology is applied to detect and highlight the cracked area at the pixel level. The adopted convolutional neural network (CNN) achieves an F1 score of 0.613 for minor crack extraction. After that, the actual crack dimension can be derived by multiplying the pixel number with the pixel size. Compared with the traditional approach, defects distributed on a complex structure can be estimated with the proposed approach. A pilot case study was conducted on a concrete footpath with cracks distributed on a selected 1500 mm × 1500 mm concrete road section. Overall, 10 out of 88 images are selected for validation; average errors ranging from 0.26 mm to 0.71 mm were achieved for minor cracks under 5 mm, which demonstrates a promising result of the proposed study.