This study investigated the abilities of pre-processing, feature selection and machine-learning methods for the spectroscopic diagnosis of soil arsenic contamination. The spectral data were pre-processed by using Savitzky-Golay smoothing, first and second derivatives, multiplicative scatter correction, standard normal variate, and mean centering. Principle component analysis (PCA) and the RELIEF algorithm were used to extract spectral features. Machine-learning methods, including random forests (RF), artificial neural network (ANN), radial basis function- and linear function- based support vector machine (RBF- and LF-SVM) were employed for establishing diagnosis models. The model accuracies were evaluated and compared by using overall accuracies (OAs). The statistical significance of the difference between models was evaluated by using McNemar’s test (Z value). The results showed that the OAs varied with the different combinations of pre-processing, feature selection, and classification methods. Feature selection methods could improve the modeling efficiencies and diagnosis accuracies, and RELIEF often outperformed PCA. The optimal models established by RF (OA = 86%), ANN (OA = 89%), RBF- (OA = 89%) and LF-SVM (OA = 87%) had no statistical difference in diagnosis accuracies (Z < 1.96, p < 0.05). These results indicated that it was feasible to diagnose soil arsenic contamination using reflectance spectroscopy. The appropriate combination of multivariate methods was important to improve diagnosis accuracies.