Accurate and automatic railhead inspection is crucial for the operational safety of railway systems. Deep learning on visual images is effective in the automatic detection of railhead defects, but either intensive data requirements or ignoring defect sizes reduce its applicability. This paper developed a machine learning framework based on wavelet scattering networks (WSNs) and neural networks (NNs) for identifying railhead defects. WSNs are functionally equivalent to deep convolutional neural networks while containing no parameters, thus suitable for non-intensive datasets. NNs can restore location and size information. The publicly available rail surface discrete defects (RSDD) datasets were analyzed, including 67 Type-I railhead images acquired from express tracks and 128 Type-II images captured from ordinary/heavy haul tracks. The ultimate validation accuracy reached 99.80% and 99.44%, respectively. WSNs can extract implicit signal features, and the support vector machine classifier can improve the learning accuracy of NNs by over 6%. Three criteria, namely the precision, recall, and F-measure, were calculated for comparison with the literature. At the pixel level, the developed approach achieved three criteria of around 90%, outperforming former methods. At the defect level, the recall rates reached 100%, indicating all labeled defects were identified. The precision rates were around 75%, affected by the insignificant misidentified speckles (smaller than 20 pixels). Nonetheless, the developed learning framework was effective in identifying railhead defects.