Surface roughness and machining accuracy are essential indicators of the quality of parts in milling. With recent advancements in sensor technology and data processing, the cutting force signals collected during the machining process can be used for the prediction and determination of the machining quality. Deep-learning-based artificial neural networks (ANNs) can process large sets of signal data and can make predictions according to the extracted data features. During the final stage of the milling process of SUS304 stainless steel, we selected the cutting speed, feed per tooth, axial depth of cut, and radial depth of cut as the experimental parameters to synchronously measure the cutting force signals with a sensory tool holder. The signals were preprocessed for feature extraction using a Fourier transform technique. Subsequently, three different ANNs (a deep neural network, a convolutional neural network, and a long short-term memory network) were applied for training in order to predict the machining quality under different cutting conditions. Two training methods, namely whole-data training and training by data classification, were adopted. We compared the predictive accuracy and efficiency of the training process of these three models based on the same training data. The training results and the measurements after machining indicated that in predicting the surface roughness based on the feed per tooth classification, all the models had a percentage error within 10%. However, the convolutional neural network (CNN) and long short-term memory (LSTM) models had a percentage error of 20% based on the whole-data training, while that of the deep neural network (DNN) model was over 50%. The percentage error for the machining accuracy prediction based on the whole-data training of the DNN and CNN models was below 10%, while that of the LSTM model was as large as 20%. However, there was no significant improvement in the results of the classification training. In all the training processes, the CNN model had the best analytical efficiency, followed by the LSTM model. The DNN model performed the worst.