Froth feature extraction plays a significant role in the monitoring and control of the flotation process. Image-based soft sensors have received a great deal of interest in the flotation process due to their low-cost and non-intrusive properties. This study proposes data-driven soft sensor models based on froth images to predict the key performance indicators of the flotation process. The ability of multiple linear regression (MLR), the backpropagation neural network (BPNN), the k-means clustering algorithm, and the convolutional neural network (CNN) to predict the amount of sulfur removal from iron ore concentrate in the column flotation process was examined. A total of 99 experimental results were used to develop the predictive models. Extracted froth features including color, bubble shape and size, texture, stability, and velocity were used to train the traditional predictive models, whereas in the CNN model the froth images were directly fed into the model. The results comparison indicated that the three-layered feedforward NN model (17-10-1 topology) and CNN model provided better predictions than the MLR and k-means algorithm. The BPNN model displayed a correlation coefficient of 0.97 and a root mean square error of 4.84% between the actual data and network output for both training and the testing datasets. The error percentages of the CNN, BPNN, MLR and k-means models were 10, 11, 15 and 18%, respectively. This study can become a key technical support for the application of intelligent models in the control of the operational variables for the flotation process used to desulfurize iron concentrate.