The morphological characteristics of a crack serve as crucial indicators for rating the condition of the concrete bridge components. Previous studies have predominantly employed deep learning techniques for pixel-level crack detection, while occasionally incorporating monocular devices to quantify the crack dimensions. However, the practical implementation of such methods with the assistance of robots or unmanned aerial vehicles (UAVs) is severely hindered due to their restrictions in frontal image acquisition at known distances. To explore a non-contact inspection approach with enhanced flexibility, efficiency and accuracy, a binocular stereo vision-based method incorporating full convolutional network (FCN) is proposed for detecting and measuring cracks. Firstly, our FCN leverages the benefits of the encoder–decoder architecture to enable precise crack segmentation while simultaneously emphasizing edge details at a rate of approximately four pictures per second in a database that is dominated by complex background cracks. The training results demonstrate a precision of 83.85%, a recall of 85.74% and an F1 score of 84.14%. Secondly, the utilization of binocular stereo vision improves the shooting flexibility and streamlines the image acquisition process. Furthermore, the introduction of a central projection scheme achieves reliable three-dimensional (3D) reconstruction of the crack morphology, effectively avoiding mismatches between the two views and providing more comprehensive dimensional depiction for cracks. An experimental test is also conducted on cracked concrete specimens, where the relative measurement error in crack width ranges from −3.9% to 36.0%, indicating the practical feasibility of our proposed method.