Aerospace welds are non-destructively evaluated (NDE) during manufacturing to identify defective parts that may pose structural risks, often using digital radiography. The analysis of these digital radiographs is time consuming and costly. Attempts to automate the analysis using conventional computer vision methods or shallow machine learning have not, thus far, provided performance equivalent to human inspectors due to the high reliability requirements and low contrast to noise ratio of the defects. Modern approaches based on deep learning have made considerable progress towards reliable automated analysis. However, limited data sets render current machine learning solutions insufficient for industrial use. Moreover, industrial acceptance would require performance demonstration using standard metrics in non-destructive evaluation, such as probability of detection (POD), which are not commonly used in previous studies. In this study, data augmentation with virtual flaws was used to overcome data scarcity, and compared with conventional data augmentation. A semantic segmentation network was trained to find defects from computed radiography data of aerospace welds. Standard evaluation metrics in non-destructive testing were adopted for the comparison. Finally, the network was deployed as an inspector’s aid in a realistic environment to predict flaws from production radiographs. The network achieved high detection reliability and defect sizing performance, and an acceptable false call rate. Virtual flaw augmentation was found to significantly improve performance, especially for limited data set sizes, and for underrepresented flaw types even at large data sets. The deployed prototype was found to be easy to use indicating readiness for industry adoption.