In order to achieve the prediction of milk protein content in milk from hygiene and health point of view, this paper uses the spectral characteristics of milk hyperspectral to propose a predictive modeling method based on convolutional neural network (CNN). In this experiment, 45 samples of milk with different concentration of protein were collected by visible/near infrared hyperspectral imaging system, and the number of samples was expanded to 4,500 by region of interest extraction, the obtained absorption spectra were processed using the Savitzky–Golay smoothing, then a 1-D CNN was used to establish the prediction model. The result of experiment indicates that the CNN model can basically complete the task of protein content prediction in milk, and the determinant coefficient of calibration set, determinant coefficient of prediction set, root mean square error of calibration set, and root mean square error of prediction set of the CNN model are 0.9071, 0.9101, 0.1159 g/(100 mL), and 0.1044 g/(100 mL), respectively. In order to verify the predictive ability of CNN, comparative experiments were carried out using more traditional partial least square regression (PLSR) and support vector regression (SVR), the experiment result shows the CNN model has the largest R2 and the smallest root mean square error. Comparing with the optimized PLSR and SVR models (dimension reduction), CNN model still has a best fitting effect. To summarize, the CNN model can fully use the spectral features of milk to achieve a higher precision prediction of milk protein content, and it has a lower demand for data preprocessing.